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Preface 


The European research project DESERVE (DEvelopment platform for Safe 
and Efficient dRiVE, 2012-2015) had the aim of designing and developing a 
platform tool to cope with the continuously increasing complexity and the 
simultaneous need to reduce costs for future embedded Advanced Driver 
Assistance Systems (ADAS). For this purpose, the DESERVE platform profits 
from cross-domain software reuse, standardization of automotive software 
component interfaces, and easy but safety-compliant integration of hetero- 
geneous modules. This enables the development of a new generation of 
ADAS applications, which challengingly combine different functions, sensors, 
actuators, hardware platforms, and Human Machine Interfaces (HMI). 

This book provides a detailed overview of the different research activities 
conducted in the course of the DESERVE project. After introducing the aims of 
the DESERVE project in Chapter 1, selected achievements of the DESERVE 
project are presented in three different parts. Part I is dedicated to the ADAS 
development platform developed during the DESERVE project. 


e Chapter 2 covers the methodology and concepts that are part of the 
generic DESERVE platform as the basis and key enabler for the devel- 
opment of new assistance systems. It describes the entire spectrum of 
aspects, e.g., modularity, interfaces, and standards, to be considered for 
the use of the DESERVE platform. 

Chapter 3 describes the development of realistic models for driver 
behavior as part of the DESERVE tool-chain needed for the evaluation 
of complex ADAS systems and driver-vehicle-environment interactions. 
The modelling system was used to simulate two different driving 
scenarios. 

Chapter 4 presents component based middleware, e.g., RTMaps and 
ADTF, for supporting the developer of complex systems with typical 
challenges like multi-sensor support, synchronization issues, and modu- 
larity. By means of different exemplary applications, in which modules 
like simulators or prototyping systems are connected to the middleware, 
the flexibility of the DESERVE tool-chain is demonstrated. 


xiii 


xiv Preface 


e Chapter 5 describes a model-in-the-loop approach for tuning ADAS 
parameters. Using the AVL CAMEO tool, model-based design space 
exploration and validation of a complex ADAS function is performed. 


In Part II, ADAS applications used as test functions in the DESERVE project 
are explained. 


e Chapter 6 presents an application of deep-learning techniques for 
semantic segmentation of camera images (1.e., Scene Labeling). After 
explaining the algorithmic basics, an FPGA-based implementation is 
presented and evaluated. 

Chapter 7 covers a system coupling an FPGA-based signal process- 
ing architecture for MIMO radar with a PC-based ADTF data post- 
processing. The hardware-software combination maximizes processing 
performance and minimizes development time of complex systems. 
Chapter 8 describes a design space exploration for online calibration 
of wide baseline stereo camera systems using sparse feature corre- 
spondences in stereo images. Challenges in hardware implementations 
of feature matching are presented and hardware-specific solutions are 
discussed. 

Chapter 9 presents a first approach of arbitration and sharing vehicle 
control between driver and assistance system based on modelling vehicles 
and driver behavior and intentions. Fuzzy logic techniques are used 
to implement the control sharing and simulations allow testing of the 
systems. 


Part III covers the validation and evaluation of two exemplary applications of 
the DESERVE platform. 


e Chapter 10 aims at exploring effective design of Human Machine Inter- 
face (HMI). During the DESERVE project, in-vehicle HMI solutions 
for different functions were developed. The HMI design process for an 
exemplary function is described in this chapter. 

e Chapter 11 shows a prototype system for vehicle-in-the-loop testing of 
ADAS functions that additionally analyzes the energy efficiency of the 
prototyped system. Combined with multi-sensor simulation, a virtual 
environment for testing ADAS functions is provided. 


Further detailed information about the contributions of DESERVE can be 
found in the list of project deliverables referenced in each chapter. 

This work was supported by the European Commission under the Artemis 
Joint Undertaking in the scope of the DESERVE project. We would like to 
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We hope that you will enjoy reading this book. 
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The DESERVE Project: Towards Future 
ADAS Functions 


Matti K utila! and Nereo Pallaro? 


IVTT Technical Research Center of Finland Ltd., Finland 
?Centro Ricerche F iat, Italy 


1.1 Project Aim 


This book aimsto outline the major innovations introduced by theDESERV E 
(DEvelopment platform for Safe and Efficient dRiVe) project. The project 
started in September 2012 and finished on February 2015 after 3,5 years 
heavy working and was coordinated by VTT Technical Research Centre of 
Finland Ltd. The project was co-funded by the European Commission under 
the ECSEL EU-Horizon 2020 programme. The project was a joint effort 
of major vehicle manufacturers (Volvo, Daimler, Fiat), component suppliers 
(Continental, Ficosa, AVL, Bosch, NX P, Infineon, dSPA CE, ASL Vision, Ram- 
boll, TTS, Technolution), research institutes (VTT, ICOOR, ReLab, INRIA, 
CTA G) and universities (VisLab, IRSEEM,ARMENIS,IKA,INTEMPORA, 
Leibniz Universitát Hannover). 


VISION 
DESERVE will design and build an ARTEMIS Tool Platform 
based on the standardisation of the interfaces, software (SW) 


reuse, development of common non-competitive SW modules, 
and easy and safety-compliant integration of standardised 
hardware (HW) or SW from different suppliers 


The main research question was to identify the optimal sensor solutions for 
the DESERVE platform which are required by the selected ADAS functions 
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for supporting transition to automated vehicles. 22 different modules were 
selected to be implemented to 11 driver support applications according to user 
needs when starting development process: 


e Lane change assistance system 

e Pedestrian safety systems 

e Forward/rearward looking system (distant range) 
e Adaptive light control 

e Park assistance 

e Night vision system 

e Cruise control system 

e Traffic sign and traffic light recognition 
e Map-supported systems 

e Vehicle interior observation 

e Driver monitoring 


The project created the methodology framework for integrating embedded 
hardware and software modules was created which enables better interoper- 
ability of automotive industry products and third party aftersales components. 
This approach is also beneficial to comprise the problem for guaranteeing 
safety and security problems when new components are added to the complex 
software and hardware stacks. 

The initial project objective has been defined in the Table 1.1 with having 
measurable verification of the expected results, 


Table 1.1 Scientific and technical objectives 


Scientific and Technical Objectives M easurable and Verifiable Form 


The definition and implementation of a By defining an analysis methodology to 


model-driven process for the compositional 
development of safety critical systems that 
allows the smooth integration of existing 
components and functions in a new 
framework. 


establish an industrially applicable 
process for exploration of design spaces 
and multi-criteria constraint satisfaction, 
with particular regard to safety properties. 


Verification: 90% or more of the 
applications identified could be 
developed with the proposed platform. 


The development of an innovative 
embedded vehicle platform capable of 
supporting the fast and reliable 
development of A DA S and efficient 
Eco-driving functions. 


By implementing demonstrators for 
active and passive safety of drivers and all 
road users in the three macro-areas in the 
automotive domain such as: 
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Table 1.1 Continued 


e Technical, safety and efficiency impact 
assessment of resulting prototypes 
following the evaluation methodologies 
identified in project PREVAL and in 
line with INTERACTIVE evaluation 
methodologies. 

e Cost-B enefits analysis. 

e Evaluation of cost reduction in 
comparison with conventional Driver 
Assistance Systems. 


Verification: 90% or more of the developed 
applications showed more than 15% of 
reduction in development time and cost. 


The integration of existing vehicle 
sensors and actuators in a unified SW 
framework for multiple safety and 
Eco-driving applications. 


Existence of a cost-effective and flexible SW 
platform, able to be used with available 
sensors/actuators. 


Verification: 90% or more of the developed 
applications show more than 15% 
reduction in development duration and 
cost, 


The adaptation of the current data 
fusion, HMI and driver's behaviour 
modules to provide suitable and 
harmonised middleware for the different 
safety and Eco-driving functions. 


The implementation of a new method 
and relative tools for A DAS functions 
development. 


By applying the V-model and developing high 
level services and A pplication Protocol 
Interface (A PI) that can be used in a wide 
range of safety-related use cases. Via 
multi-modal HM | with user related and driver 
behaviour assessment through tests in driving 
simulator and in prototype vehicles. 


Verification: Statistical evidence of 
improvement of driver acceptance between 
existing (on the market) and 

DESERVE -developed functions. Subjective 
evaluation through questionnaires. 


Existence of new tools for development of 
Driver Assistance Systems, including data 
fusion visualisation, algorithm development, 
actuation simulation, etc. 


Verification: Evidence that the method is 
suitable for effective ADAS developments: 


e Results of the test case development 
e Results of workshops with main 
stakeholders, O E M s and automotive 


suppliers. 
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The developed applications are tested and validated in different demon- 
stration vehicles for showing that DESERVE methodology is not limited to 
one single vehicle type. The project demonstration vehicles are: 


e two medium class passenger cars from Fiat 
e research passenger car from V TT 

e luxury passenger car from Daimler 

e heavy goods vehicle from Volvo 

e driver training truck from TTS 


Additionally, tests will also be conducted in simulators, e.g. a simulator for 
driver monitoring functions and a simulator for cruise control systems. 


1.2 Project Structure 


The project was divided into 8 sub-projects (see Figure 1.1) in order to keep 
the whole development chain manageable and taking different automotive 
orientated technical challenges into account. 

This project workflow also enabled professional development process 
starting from the requirements and finishing to the validation phase. One sub- 
project was engaged with specifying and designing the DESERVE platform 
and three sub-projects for doing implementation. 


SP8 Project 
Management 
SP7 Dissemination 
and exploitation 
SP1 Requirements Na) 
and specifications SP6 Validation and 
Evaluation 
SP2 ADAS 
development platform 
SP5 Integration and 
tests 
SP3 Driver behaviour 
- HMI 
SP4 Test Case 
Functions 


Figure 1.1 The DESERVE V-shape development process. 
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1.3 DESERVE Platform Design 


The project developed the framework methodology (see Figure 1.2) to 
integrate new software components to car environment. In practise, the 
methodology verified with implementing two alternative solutions which were 
adapted to fit to the project framework design. The one bases on ADTF which 
is mainly utilised by the German automotive industry and RTM aps which 
is implemented by the other demonstrators. Since the aim is to introduce a 
solution which will be exploited in real vehicles both solutions this gives good 
bases to bring the specified framework to cars in future within next 5 years. 


1.4 The Project Innovation Summary 


The project was not limited to the framework design but was also further 
developing the current in-vehicle technology. The specific areas where steps 
were taken forward are: 


lterative process 


MicroAutoBox, FPGA, 
Embedded PC, Aurix, .. Cost prediction (silicon 
area, electronics, etc.) 


HIL, MIL, PIL, test bench 


Figure 1.2 DESERVE platform concept for speeding up the ADAS function development 
time. 
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e Night time environment perception 

e Driver monitoring topics: Drowsiness and distraction detection 

e Embedded in-vehicle computing system: Setting up FPGA based auto- 
motive CPUs 

e Vehicle blind spot detection 

e Vehicle surrounding awareness 

e New human-machine interface concept 


However, these are kind of by-products since main intention was to develop 
common methodology for automotive software implementation. The project 
therefore, took steps forward in developing common framework (i.e. metho- 
dology) to bring new functions to the vehicles. These are not limited to above 
functionalities but they are the first steps. 

The one DESERVE platform allows the co-design of software and hard- 
ware for applications and algorithms. The whole application or algorithm can 
be implemented in software using for example ADTF, RTM aps or Simulink 
interfaces which allows reusability, flexibility and fast verification of the 
implemented hardware modules. 


1.5 Conclusions 


The original project target was to develop a common software platform for 
modern vehicles. The expected outcome is that the platform fits up to 90 % of 
all new applications introduced in the new cars. The novel A DAS functions are 
becoming more and more complex and the new features are software-based 
instead of mechanical solutions like they were 10 to15 years ago. However, 
software is always prone to errors which may have serious consequences if 
e.g. the vehicle accelerates when emergency braking is expected. Therefore, a 
proper evaluation procedure is needed by using proper performance indicators, 
in order to verify the correct functionality of the platform. 

As the final concluding remark, the DESERVE methodology pushes 
forward the situation compared to the current approaches in the automotive 
industry. The used architecture for the DESERVE platform is flexible and 
modular and enables to add new software components, devices, modules and 
functions even if the set of vehicle sensors, actuators and HMI remains. 
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2.1 Introduction to the DESERVE Platform Concept 


A soutlined by Figure 2.1, theD E SERVE platform is the key enabler for speed- 
ing up the development of next generation ADAS systems. The DESERVE 
platform represents an open platform to be used by anyone. This chapter 
therefore covers the entire spectrum of aspects to be considered for the use of 
this generic DESERVE platform. 

Please kindly note that the extensive work on theDESERVE platform can- 
not be completely described here. Thus, reference to a manifold of DESERVE 
deliverables are made. A s most of these deliverables are not publicly available, 
essential findings in these deliverable reports were included here to provide a 
complete view on the DESERVE platform. 

The DESERVE platform relies on model-based design and virtual testing 
tools. Its openness is based on the compliance with AUTOSAR standards. Al! 
AUTOSAR members have access to these standardized interfaces. 
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TheDESERVE platform isnotrelated to any specific hardware or software. 
In contrast, it is generic and represents a new methodology and concept 
to develop future ADAS systems more efficient and more flexible with 
maximum reuse of modules and components due to well-defined processes 
and standardizations on architecture and encapsulated module levels. 

Requirements engineering is applied for next generation A DA S systems. 
By means of model-based design (e.g. Matlab/Simulink/ADTF/RTM aps) 
fast implementation in ADAS rapid prototyping framework is achieved 
(development level 2). Rapid prototyping results are evaluated by H ardware- 
in-the-L oop (HIL), M odel-in-the-L oop (MIL) or Processor-in-the-L oop (PIL) 
test bench. In parallel, by making use of model based design space explo- 
ration, specifications and requirements for System-on-Chip (SoC) can be 
derived at a very early development phase, which supports cost predic- 
tion on basis of silicon area, throughput etc. Both, validation by virtual 
testing and cost prediction indicate important improvement potentials that 
need to be implemented in the next cycle of the iterative development 
process. 

The situation before DESERVE can be characterized by the absence of 
model-based access to perception and fusion algorithms, missing AUTOSAR 
compatibility, there is no library with available algorithms (for composing and 
evaluating new algorithms). Rather, testing the application on real vehicles in 
real traffic scenarios is the approach followed, together with some recording 
feature to allow the capturing of the critical situations, where the solution fails 
for example, in order to reproduce them in some way later in laboratory. 

The objectives of the DESERVE platform are driven by the market needs, 
which are enabling a further growth of embedded systems and more specifi- 
cally advanced driver assistance systems (A DA S), mastering the complexity 
(both in system architecture and processing power) of ADAS, reducing costs 
of components and developmenttime of A DA S as well as the seamless integra- 
tion of the growing amount of functions within A DA S and the corresponding 
vehicle. 

DESERVE strives to meet these markets needs by aiming at a novel 
design and more efficient development process that is enabled by a platform. 
A platform that provides a flexible development framework, reaching from 
early PC-based pre-developments down to close-to-production hardware 
implementations on final target systems on chip, to seamlessly support the 
ADA S development levels; that constructs a tool chain to allow for modelling 
and evaluation via virtual testing of new sensors, algorithms, applications 
and actuators during the whole design and development process; a platform; 
that forms a common in-vehicle platform for future ADAS functions based 
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Figure2.2 DESERVE platform enabled design and development process. 


on a modular approach and an architecture and interface specifications that 
are compatible with AU TOSAR (access and easy-to-use also for non-project- 
partners); a platform that enables the integration of safety mechanisms for 
pre-certification (generic safety requirements e.g. for testing on public roads) 
and full requirements for ASIL D according to ISO 26262 (to prepare certifi- 
cation of later target platform) and security mechanisms for pre-certification 
of connected A DAS according to ISO 27001. 

The novel design and efficient development process is based on the well- 
known V-model and fully DESERVE platform supported during all phases in 
the process. This is illustrated in Figure 2.2. 


2.2 The DESERVE Platform - A Flexible Development 
Framework to Seamlessly Support the ADAS 
Development Levels 


This section introduces into the development methods and guidelines asso- 
ciated with the DESERVE platform and outlines the benefits in terms of 
development cost and time savings from the OEM perspective. Basically, the 
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platform concept is based on three pillars which reflect the different develop- 
ment levels and the transition of ADAS algorithms from the prototyping to 
production phase in the automotive industry (see Figure 2.3). 

The DESERVE platform is a generic platform that supports all develop- 
ment levels illustrated in Figure 2.3 as seamless as possible - from feasibility 
study to product development. 


Level 1: PC platform 

In the research and pre-development phase users typically require highly 
flexible tools with an intuitive user interface and the implementation of A DA S 
algorithms may not satisfy hard real-time requirements. H ere, PC-based tools 
such asA DTF and RTM aps for data fusion often constitute the basis for A DA S 
development. 

Such tools provide a high user comfort and allow developers to implement 
and verify algorithms directly on a standard MS Windows or Linux PC. 
Different kinds of sensors/actuators and vehicle bus interfaces are available 
so that the algorithms can directly be tested in a real environment. However, 
real-time calculation is not guaranteed, especially with complex perception, 
fusion and tracking algorithms. In addition, there is no direct support of 
M atlab/Simulink, AUTOSAR and the model-based design approach for appli- 
cation functions. Finally, PC platforms as described above are typically not 
tailored for stand-alone, in-vehicle use cases. 


C/C*«/C& € 
algorithms 


A oray of basic E 
~ building blocks 


Figure 2.3 ADAS development process. 
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To avoid a time-consuming redesign of perception, fusion or tracking 
algorithms when implementing them on the final ECU hardware (production 
ECU), engineers are looking for ways to evaluate different target hardware 
architectures according to given cost criteria already in early development 
stages. This requestis met by the design space exploration (DSE) methodology 
and the SoC modelling approach. 


Level 2: Rapid prototyping platform including software superstructure 

(e.g. embedded P C /embedded controller with realtime 

operating system and F PGA) 

In the second development stage engineers go one step closer to a real- 
time implementation. Complex and computationally intensive algorithms are 
shifted to a powerful FPGA to improve the realtime capability. In parallel 
to this, the FPGA platform allows different target hardware architectures 
to be evaluated in combination with the selected algorithms. To ensure a 
rapid implementation of the above mentioned perception, fusion, and tracking 
algorithms inthe FPGA , basic building blocksin terms of a library are provided 
by the DSE framework. B y means of this block-based modeling approach the 
time and effort for implementing the associated algorithms can significantly 
be reduced. 

Using an embedded system platform in this stage featuring both an FPGA 
and an embedded controller also allows ADAS application algorithms to be 
designed by means of models so that the associated development time can 
further be reduced. Compared to the purely PC based framework real-time 
performance is almost guaranteed, though the user comfort with programming 
the FPGA may be restricted. 


Level 3: Fully embedded, AUTOSAR compatible architecture 

(e.g. multicore controller with F PGA) for the evaluation of algorithms 

in realtime and implementation of safety requirements according 

to ISO 26262 (e.g. pre-certification for testing on public roads) 

The goal of this stage is to go one step further to the final target hardware 
and to provide a stand-alone, in-vehicle rapid prototyping platform which, for 
example, can even be used during test drives. This stage reflects the users' 
need to evaluate and experience the driver assistance system directly in the 
vehicle itself. 

The standard PC is replaced by an embedded PC that is qualified for in- 
vehicle use in terms of shock, vibration and temperature, similar to the other 
parts of the system. This platform also allows the integration of hardware 
accelerators so that even highly computational intensive algorithms may be 
tested in the vehicle. It is also possible to interface target microcontrollers of 
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production ECUs and to run certain algorithms there. The complete platform 
behaves like a prototype ECU which can be operated by test drivers which are 
not specifically instructed. For example, the platform can be started and shut 
down via the vehicle's ignition key. 

The development platforms of all stages can be used together with the 
model-based design space exploration approach for system on chip and 
libraries of basic building blocks for the FPGA. By means of this the 
gap is closed when transferring perception, fusion and tracking algorithms 
from prototyping to production, similar to the model-based design approach 
with application functions using Simulink. Being able to use already tested 
and validated building blocks and software modules greatly facilitates and 
expedites the development process. 

To support the model-based development of algorithms at all processing 
layers (perception, decision making, warning and control strategies) and to 
execute these algorithms in the vehicle, the DESERVE platform level 3 needs 
to befully compatibletotheA UTOSAR standard (note: asof today, no certified 
AUTOSAR 4.0 real-time operating system including memory protection is 
available; its development is not subject of DESERVE). 

In addition, at this development level, safety mechanisms need to be 
developed: According to ISO 26262 the DAS system needs to be classified 
concerning theA utomotive Safety Integrity L evel (A SIL). M any DAS systems 
require the highest classification A SIL D. Suitable measures are required to 
fulfil the related strong requirements. A sthe certification process is very much 
related to the hardware, just pre-certification (e.g. for testing of the new DAS 
on public roads) is possible at this development level. 

As aresult, OEM s are able to define early and precise enough the distinct 
requirements for the final ECU hard- and software (e.g. required interfaces - 
which I/O and bus system; computational power; memory requirements), 
including the safety mechanisms (e.g. memory protection, lockstep operation). 


Level 4: Target production platform (e.g. multicore controller ECU 

with integrated custom ASIC/F PG A/hardware accelerator) 

On basis of the production hardware, the final certification of the ADAS takes 
place. Within the DESERVE project, the generic DESERVE platform concept 
was validated. Starting with purely PC-based development, algorithms can 
be outsourced step by step to an FPGA or embedded controller prototyping 
system. In addition to the hardware concept, a design space exploration 
and an analytical modelling approach for system on chip is proposed. This 
software framework allows different target hardware architectures for the 
implementation of perception algorithms to be evaluated according to given 
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cost criteria in early development phases. T he software framework is coupled 
to the FPGA of the DESERVE platform. The associated workflow will be 
supported by a library of basic building blocks for the FPGA by means of 
which perception algorithms can be composed and implemented quickly. 

To validate the platform concept, three different realization instances of 
the generic DESERVE platform are considered in the project: 


e Level 1: Purely PC based solution 

e Level 2: Mixed PC/embedded control based on dSpace Micro A utobox 
with FPGA framework (this platform will be extensively used for the 
ADAS vehicle demonstrators) 

e Level 3: Fully embedded platform based on multicore controller plus 
FPGA. This instance of the DESERVE platform provides realtime 
operating system and basis software fully compatible to the AUTOSAR 
standard. T hus it is open and easy to use for all AUTOSAR members. It 
will also feature safety concepts required for ASIL D and consider new 
radar/camera interfaces. 


2.3 DESERVE Platform Requirements 


The next step in the definition process for the DESERVE platform concerned 
the translation of the previously defined platform needs into generic require- 
ments for the DESERVE platform based on common software architecture 
and suitable for the development and simulation of the 33 DAS functions 
investigated in the beginning. 

The generic requirements for the DESERVE platform were defined 
utilizing the following approach (see deliverables D1.2.1 [1]). 

TheDESERVE development platform has been defined taking into account 
that general requirements such as AUTOSAR compatibility [6], SPICE com- 
pliance and functional safety (ISO 26262) [7, 8] are mandatory for industrial 
use. These requirements apply for the “industrialized platform”. The generic 
DESERVE platform addresses a functional software architecture based on 
Perception, A pplication and IWI platforms. 


2.3.1 DESERVE Platform Framework 


TheDESERVE platform has been defined taking into account general require- 
ments such as AUTOSAR compatibility, SPICE compliance and functional 
safety (ISO 26262), which are mandatory for the later industrial use. The 
AUTOSAR standard comprises a set of specifications describing software 
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architecture components and defining their interfaces. DESERVE aims at 
using AUTOSAR to integrate applications from different suppliers inside a 
single processing unit. 

DESERVE addressed also to be compliant with the SPICE standard, which 
represents a set of technical standards documents for the computer software 
development process and related business management functions. The ISO 
26262 standard was considered in the implementation of DESERV E platform 
in order to improve the safety in the development of methods and tools. 
The ISO 26262 standard defines the "Functional Safety Assessment” at the 
completion of the item development with the scope to assess the functional 
safety that is achieved by the element under safety analysis. 

The baseline for DESERVE is represented by the results of past and on- 
going research projects [9, 10], and in particular of interactl Ve addressing 
the development of a common perception framework for multiple safety 
applications with unified output interface from the perception layer to the 
application layer [11]. 

Figure 2.4 presents the DESERVE platform framework. In this generic 
architecture the perception platform processes the data received from the 
sensors that are available on the ego vehicle and sends them to the application 
platform in order to develop control functions and to decide the actuation 


Figure2.4 DESERVE platform framework. 
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strategies. Finally, the output is sent to the IWI platform informing the 
driver in case of warning conditions and activating the systems related to 
the longitudinal and/or lateral dynamics. 


2.3.2 Generic DESERVE Platform Requirements 
(Relevant to all Development Levels) 


Different clusters of requirements were defined following the structure of 
the DESERVE platform framework. Please note that each of the following 
requirements was divided in sub-requirements, which are described in detail 
in DESERVE deliverable D1.2.1. 


General software requirements 

General software requirements: Among others, these cover the previously 
mentioned software requirements for modularity, reusability, AUTOSAR, 
SPICE process assessment (ISO/IEC 15504), functional safety (ISO 26262), 
platform independence (the application software needs to be independent 
from the processing hardware), standardized interfaces (i.e. the software 
needs to have interfaces to sensors and actuators that are standardized 
and published), operating system independence (cross platform libraries are 
recommended), programming language, communication technologies inde- 
pendence, automatic start-up/shut-down, configuration of sensors position, 
software versioning and licenses. 


General hardware platform requirements 
These cover the aspects power supply, list of supported sensors, processing 
unit, unit size and number of included components etc. 


Perception module requirements 
These requirements include 3D reconstruction of the scene in front of the 
vehicle, ADA SIS horizon, assignment of objects to lanes, detection of the free 
space, driver monitoring, enhanced vehicle positioning, environment, front 
near range perception, frontal object perception, lane course, lane recogni- 
tion, moving object classification, occupant monitoring, parking lot detector, 
recognition of unavoidable crash situations, relative positioning of the ego 
vehicleto the road, road data fusion, road edge detection, scene labelling, self- 
calibration, side/rear object perception, traffic sign detector, vehicle filter/state, 
vehicle light detector, vehicle trajectory calculation, vulnerable road users 
detection and classification. 

The functional architecture of the perception layer is illustrated in 
Figure 2.5. Depending on the ADAS system to be realized, some of the 
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Figure 2.5 Perception platform functional architecture. 


components in the generic perception platform architecture may be omitted 
(without losing generality). The modules developed in the project to build the 
demonstrators are highlighted by thicker boxes. 

The number and variety of the different perception sources is manifold 
and requires special care and precaution to transport the available information 
in the subsequent data processing modules. Two main aspects have to be 
taken into consideration when connecting perception sources to the DESERVE 
platform: The information content may differ from sensor to sensor even 
when the same technique (e.g. radar, video camera or ultrasonic sensor) is 
used. Based on the physical concept used the individual sensors may have an 
intrinsic lack of information that can never be provided, independent of the 
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effort spent to improve the sensor performance (e.g. radar sensors can never 
“visually” read the road signs content while video sensors can never provide 
direct speed measurements). 

By using the general interface descriptor approach the data input structure 
forthe perception layer processing module becomes independent from the real 
sensors connected to the DESERVE platform. This kind of concept is used 
in PC architecture since several years under the term hardware abstraction 
layer that completely decouples data information from the physical hardware 
in use. 

The flexibility and scalability of the overall system is much better 
and reusability of SW components that are already developed is higher. 
Improvements and changes within the subgroups (i.e. environmental sensors 
or perception input processing module) can be conducted on a standalone basis 
without modifying or adapting the whole data processing chain at all. General 
adoption of the whole data processing chain is thus only needed in the case 
that the interference descriptors between the modules have to be updated or 
modified due to recently emerging needs. 

As the diversity of the already existing environmental sensors is already 
huge and many products are already in series production, the change of the 
sensor output signals is often not possible at all. To connect already existing 
sensing devices or sensors with an IP-protected signal output to the open 
DESERVE platform, a work-around with converter or breakout boxes can 
be applied. Using such interface converter/breakout boxes almost any kind 
of sensor system can be attached to the standardized and abstracted input 
channels of the generic DESERVE platform. 


Application module requirements 

The application module needs to consider the following requirements: ACC 
control, activation control, advance warning generator, calculation of required 
evasion trajectory, decision unit, driver intention detection, driving strategy, 
intervention path determination, IW | manager, reference maneuver, situation 
analysis, target selection, threat assessment, trajectory control, trajectory 
planning, vehicle model and vehicle motion control. 

The functional scheme of the application platform modules is depicted 
in Figure 2.6. The modules are divided in clusters having the same scope. 
Some of them have mainly the objective to select the driver intention and the 
most dangerous target. Other modules execute control operations and make 
an evaluation about the current situation of warning and eventually decide 
specific actions. Then the type of information to provide to the driver and the 
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Figure 2.6 Application platform functional architecture. 


intervention strategy are decided. Finally, the kind of actuation to adopt is 
provided to the IWI Platform modules. 


IWI module requirements 
ThelW1 moduleis dedicated to suitrequirements regarding the HM I (acoustic, 
displays, telltales, haptic steering wheel, haptic accelerator pedal, haptic safety 
belt), actuation of external lights, lateral actuation (steering angle and steering 
torque controller) and longitudinal actuation (engine acceleration controller). 
The functional architecture of the IW! platform is depicted in Figure 2.7. 
Different levels in the development process of ADAS require different 
instances (i.e. realizations) of the generic DESERVE platform - from PC based 
(development level 1) to production hardware (development level 4). With 
increasing development levels, additional requirements need to be addressed. 
This principle shall be explained in the next two subsections. 


2.3.3 Rapid Prototyping Framework Requirements 
(Development Level 2) 


This section shortly outlines the main requirements for the DESERVE rapid 
prototyping platform. The main intention here is to specify a flexible and 
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Figure2.7 DESERVE IWI platform. 


modular rapid prototyping environment allowing ADAS related perception, 
application and intervention algorithms to be developed in short iteration 
cycles and to be prototyped directly in the vehicle. In order to do so, there is a 
need to connect different kinds of sensors to the development framework, to 
pre-process and fuse the sensor data, to calculatethe actual A DA S applications 
and to finally drive the respective actuators. 

The structure for the generic requirements in the previous section, the 
rapid prototyping system requirements are structured in hardware, software 
and FPGA code requirements. In addition, a distinction is made between 
perception (i.e. sensor data processing) and application algorithms. 


2.3.4 Additional Requirements for Embedded Multicore 
Platform with FPGA (Development Level 3) 


While the main focus of development level 2 is on evaluation of algorithms 
in real-time on public roads, thus on ADAS functionalities and use in the 
DESERVE DAS function demonstrators, levels 3 (and 4) go significantly 
ahead in terms of fulfilling "critical" requirements like AUTOSAR com- 
patibility, SPICE compliance and functional safety (ISO 26262) which are 
mandatory for industrial use of the platform. Due to limited resources and 
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limited project duration, these requirements cannot be fully implemented 
in DESERVE. Nevertheless all the work done for the “non-industrialized” 
DESERVE platform can be (partly) reused or carried overto the industrialized 
version of the DESERVE platform (level 4). 


2.4 DESERVE Platform Specification and Architecture 


The generic platform requirements were translated into specifications, which 
represent the starting point for the development of modules for the DESERVE 
platform. T he specifications were included into an Excel file which is acces- 
sible to all project partners via the project server. By means of an iterative 
process, both specifications and software design were refined and improved. 
A summary of the specification approach and of the specifications derived from 
the DESERVE platform requirements is provided in deliverable D 1.3.1 [2]. 


2.4.1 DESERVE Platform Architecture 


Thearchitecture of the DESERV E development platform shall follow both the 
principle of standard DAS development cycles and the mappings of applica- 
tion building blocks to final, often heterogeneous hardware implementations. 
To date there is no tool or framework available that covers both requirements 
at the same time on the same platform. 

In the early concept and implementation phase the basic development, 
specification and validation (e.g. with MIL, SIL or HIL) is often done with 
another development framework (both for SW and HW) than the one applied 
for the final target platform. Little is known or taken into account from the 
final embedded system characteristics when first application algorithms are 
programmed and very often the SW modules written in this first development 
environment have to be reprogrammed from the scratch when porting it to 
the embedded system on chip. If the software, mostly written in a high-level 
programming language, finally fits the target system one has selected for series 
production, is a game of pure chance and not rarely during the series product 
development cyclea larger target system or some "add-ons" haveto be chosen. 
With the new design space exploration methodology the certainty to select the 
suitable embedded target system at first time is significantly increased. 

The DESERVE development platform architecture has to comply with the 
following basic needs: 


e Enough flexibility to encompass different development environments 
in a common, seamless framework for both the high-level algorithm 
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development and the easy porting of these SW modules to the embedded 
target platform. 

e Real time recording and playback capabilities for both the high-level and 
embedded system implementations. 

e A communication architecture that is capable to shift SW portions 
from the high-level development side to the embedded target system 
as required (i.e. bypassing with HW accelerators). 

e A seamless interoperability and replacement between the high-level 
(i.e. PC-based) and embedded target systems both for development and 
validation purposes. 


The basic idea and intention of this hardware architecture is to standardize the 
interfaces between the three different development concept levels as good as 
possible. 

Inputs from proprietary ADA S sensor systems and information sources 
are analyzed via a generic interface no. 1 to the PC based development 
environment. Here the ADTF tool with its filter programming concept is 
used to develop or improve SW modules on a high-level programming 
language. The partitioning and optimization of parts of the SW modules is 
consecutively done by shifting such portions over the generic interface no. 2 
to the embedded controller framework that is already much nearer to the final 
commercial product. Via this bidirectional interface bypassing techniques like 
PIL (embedded Processor |n the L oop) can be realized. In afinal step, dedicated 
HW accelerators can be linked in via the generic interface no. 3 by applying 
the same bypassing concept. Especially computationally intensive tasks can 
so be “outsourced”, so that even the PC-based platform is capable to keep the 
stringent real-time constraints. 

Depending on the performance of the PC either all or only specific parts of 
the SW modules can be executed there. D uring the development process more 
and more SW parts are transferred to the HW-Accelerator level, which, in 
the final development stage, results in the next generation embedded ADAS 
target system. At this last development step, the level 1 (PC) and level 2 
(embedded controller) platform will only serve as a shell to keep up the overall 
development framework. 

Reuse of already existing components from former ADAS generations 
may be used in the early development phase as HW accelerators for compu- 
tational intensive calculations. M ainly standard algorithms that are fixed and 
receive no further modifications are preferred candidates for such specific HW 
accelerators. 
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Figure 2.8 DESERVE platform (eg. for development Level 2 - rapid prototyping system 
based on mixed PC and embedded controller framework). 


This section summarizes the DESERVE platform architecture aspects. It 
considers hard- and software architecture aspects. The platform architecture 
is described in detail in deliverable D25.2 [4]. 


2.4.1.1 Hardware architecture 

DESERVE has to be flexible enough to be implemented in a distributed and 
scalable architecture (several modules, each of them able to sense and/or 
process and/or actuate) or a concentrated one (sensors and actuators all linked 
with a single unit of processing and control). Task 2.5.1 identifies which 
conditions have to be satisfied by the individual subsystem architectures in 
order to be compliant with the DESERVE generic hardware platform. 

For maximum reusability the DESERVE concept and hardware architec- 
ture was designed in such a way that subsystems of different generations 
(or respectively the kernels of it) can be used in parallel, thereby enabling 
the rapid and effective creation of next-generation innovative A DAS systems 
by using well tested and certified kernel functions of the “old” system which 
partly could be already implemented as SoC (System on Chip). TheDESERVE 
development platform can be seen as a flexible rapid-prototyping environment 
that enables fast and efficient development of next generationA DAS functions 
in a continuous iteration cycle between the current and next-generation 
embedded subsystem components. 

Furthermore, the DESERVE concept is flexible enough for different 
DESERVE partners to make different implementations. These would be of 
forms that might in future be interoperable, although DESERVE will not 
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attempt to define detailed standards which would be necessary for actual 
interoperability. 

The main DESERVE idea concerns the use of one common platform 
system (Figure 2.9) for all ADAS functional modules, instead of the current 
approach to have one platform for each individual A DAS system. Basically, 
three main hardware architecture challenges arise from this idea: 


e Automotive quality: The platform needs to provide high reliability over 
the complete automotive temperature range, power supply and environ- 
mental conditions. A sA DA S systems address safety aspects, the platform 
should implement as far as possible the ISO 26262 requirements, i.e. at 
least the hardware components that are near to the final product unit shall 
support the required A SIL level. 

Possibility to extend hardware capabilities: The platform needs to be 
designed up-front to support the possibility to include additional hard- 
ware into the system. Standard sensor interfaces are needed, for instance, 
but also standardized interfacing to external FPGA /DSP for performance 
enhancement is required. For scalability purposes, such external devices 
need to be cascadable. Similar considerations hold for the memory 
interface capability. 

A special case of hardware extension capabilities is the reuse of serial 
parts from earlier generations to speed up the development process or to 
increase the sensor perception by placing more sensors on the car. 
Finally, a seamless environment tool chain is needed. One key require- 
ment lies in the reuse of the existing tool ecosystem over several 
platform generations. Further, we should target adaptability of the 
tools to the broad industry use cases, e.g. next generation video 
and radar sensors. Additionally, real-time monitoring and debugging 
of interface and processing for development purposes represent key 
challenges. 


2.4.1.2 Software architecture 

As for hardware architecture, the characteristics and constraints that the 
software architecture has to fulfill to accept an application based on modules 
developed inside the DESERVE platform (Figure 2.10) were identified. 
AUTOSAR standards were considered. 


!Note: Being a research project, the development work conducted in DESERVE is dis- 
charged from being fully compliant with the AUTOSAR standard. W here possible and easy 
to implement, inputs from AUTOSAR were considered, of course. A mandatory request for 
AUTOSAR compliance is, however, not up for discussion. 
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Figure 2.10 DESERVE platform architecture. 


The key architecture challenges are: AUTOSAR Standards Architecture 
for the full platform system including performance accelerators, request for 
high SW re-usability/testability including re-use of older generation software 
blocks, fast time to market, highly optimized library for optimal performance, 
automatic code generation, standard compiler/tool chain and finally, hardware 
tool software support for realtime debugging, high speed parallel sensor data 
capture for validation and on-system debugging is required. 


Application Software M odules 
On the base of AUTOSAR standard, the general software architecture can 
be represented in three main layers: low level (basic software: this level 
abstracts from the hardware, provides basic and complex drivers and services 
for high level, i.e. memory, 1/0), middlelevel (virtual function bus and runtime 
infrastructure) and high level (application software components). 
TheAUTOSAR standard introduces two architectural concepts (respects 
to other embedded software architectures) that facilitate infrastructure inde- 
pendent software development. Namely, these are the Virtual Function Bus 
(VFB) and the Runtime Infrastructure (RTE) that are closely related to each 
other. 
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In order to realize this degree of flexibility against the underlying infras- 
tructure, the AUTOSAR software architecture follows several abstraction 
principles. In general, any piece of software within an AUTOSAR infras- 
tructure can be seen as an independent component while each AUTOSAR 
application is a set of inter-connected AUTOSAR components. 

Further, the different layers of abstraction allow the application designer 
to disregard several aspects of the physical system on which the appli- 
cation will later be deployed on, like type of micro controller, type of 
ECU hardware, physical location of interconnected components, networking 
technology/buses or instantiation of components/number of instances. 

The middle level, VFB (Figure 2.11), provides generic communication 
services that can be consumed by any existing AUTOSAR software com- 
ponent. Although any of these services are virtual. They will in a later 
development phase be mapped to actual implemented methods that are specific 
for the underlying hardware infrastructure. The RTE (runtime environment) 
provides an actual representation of the virtual concepts of the VFB for one 
specific ECU. 

An AUTOSAR software component in general is the core of any 
AUTOSAR application. It is built as a hierarchical composition of atomic 
software components. TheAUTOSAR software component can be divided in 
Application Software Component and AUTOSAR Interface. It is important 
for DESERVE to preserve (and build up during the prototyping phase of the 
applications) the AUTOSAR modularity concept. Consequently, DESERVE 
focuses on the development of modular A pplication Software Components. 
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omponent 

AUTOSAR 
Interface 
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AUTOSAR 
Interface 


Figure2.11 Overview on the principles of virtual interaction using theAUTOSAR. 
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M ulti-task option to permit adding and removing of functionalities 

The modularity is one the most important directive in the design of a global 
architecture, their functions and modules for embedded systems. Different 
multi-tasks (called processes) can be executed by sharing common processing 
resources in the same CPU. In this line, multi-thread languages as C++ are 
used by different developers around the world. 

The software environments used in the DESERVE platforms (e.g. ADTF 
and RTM aps) are able to transfer functions already programmed in C and 
C++. These tools are multi-sensory software, designed for fast and robust 
implementation in multitask systems. They use functional blocks (called 
components) for data flowing between different types of modules: video, 
audio, byte streams, CAN frames, among others. 

This multi-threaded architecture allows the use of multiple asynchronous 
sensors within the same application (see RTM aps and A DTF sectionsinD 1.3.2 
[3]). M oreover, they take advantage of multi-processor architecture for more 
computing power. 

Based on the D evelopment Platform R equirements [1], there arethree main 
stages in the control architecture: perception, application and IWI platform. 
The goal of the DESERVE approach is to add different functions (M ulti-task) 
in the same platform. 


2.4.2 DESERVE Platform Interface Definition 


The definition of the DESERVE interface architecture is described together 
with state of the art ADAS interfaces and next generation interfaces in 
deliverable D 2.5.4 [5]. Due to the high relevance of the interface architecture 
for the DESERVE platform concept, a brief description is included in the next 
paragraphs. 


2.4.2.1 Definition of DESERVE interface architecture 

The definitions of the interface architecture plays a central role for the 
communication and data exchange between the different DESERVE platform 
modules and sensor components. In the DESERVE deliverable D2.2.1 [12] 
the abstracted interface descriptors are already defined on a content-based 
hierarchical level. With standardized information data flow between the 
numerous platform modules both the development time and the extension 
in performance and scope of the encapsulated modules can be realized very 
efficiently and in a well-structured way. The architecture of the interface has 
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to be defined individually for each of the existing OSI layers, starting from 
the physical layer up to the application layer. 

For modules that only communicate within the same hardware unit the 
physical data and communication layer are no longer needed. Instead, a 
message box oriented data transfer link is proposed for usage in the DESERVE 
project. The data to be transmitted is written in a predefined message box 
descriptor field and message flags trigger the synchronization and data 
updates in the concerned modules. The message box principle is sketched in 
Figure 2.12. 

The interfacing concept of the AUTOSAR standard is considered and 
incorporated in the DESERVE platform where useful and appropriate. The 
AUTOSAR mode of operation, as depicted in Figure 2.13, fits already quite 
well with the general DESERVE approach proposed in this document. 

In order to achieve a good reusability of embedded software functions, it 
has proven to be efficient in the industry to separate the “function software” 
from parameters defining the behavior of the software (=calibration data). This 
allows generating embedded systems with generic software functionalities 
by “embedded systems suppliers” (e.g. Continental, Bosch or others). Such 
systems are bought by OEMs for building their ADAS systems. The OEM 
can adapt the generic function to the individual behavior significant for his 
customers “just by calibration”. In this process via an application system 
(market leader is INCA for example), the calibration data can be changed 
while the embedded system is running - regardless if simulated on a PC or 


= 


Figure 2.12 Message box principle for intra-unit communication. 
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Figure2.13 AUTOSAR application software concept. 


running already on the target hardware. T he separation of calibration date and 
function software is also allowed according to the AUTOSAR concept. 


2.4.2.2 Existing ADAS interfaces 
All electronic embedded systems used to control vehiclefunctions (specifically 
ADAS) need communications networks and protocols to manage all the 
process information. The modules receive input information from a network 
of sensors (e.g. for engine speed, lasers, cameras, etc.) and send commands 
to the control stage (Application platform in DESERVE), and finally to the 
actuators or warning systems that execute the commands (IW! platform) [1]. 
Due to the increasing complexity of modern ADAS applications, point- 
to-point wiring has been replaced by multiple networks and communications 
protocols. These protocols use different physical media to provide safe 
connection among components on the vehicle. These include single wires, 
twisted wire pairs, optical fiber cables, and communication over the vehicle's 
power lines. 


Communication protocols 
Some of the most known and used communication protocols and standards 
used in nowadays vehicles are: 


e CAN (controller area network) 
e VAN (vehicle area network) 
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e FlexRay 

e LIN (local interconnect network) 

e SAE-J1939 and ISO 11783 

e MOST (M edia-Oriented Systems Transport) 
e Keyword Protocol 2000 (K W P2000) 


Recent vehicles have installed multiple networks (with different protocols) to 
communicate among electronic control units (ECU) onboard. The networks 
are isolated from one another for several reasons, including bandwidth and 
integration concerns. 


Existing interface standards 

Current A DA S systems are designed and built to provide a dedicated answer 
to specific functionalities. M ost ADAS are including in the same box the 
sensor itself and the processing unit. So, the raw data provided by the sensor 
(camera, radar) are directly loaded inside the ECU unit and processed. Only 
high level (processed) information is available on the communication buses. 
Raw data (e.g. pixel information of images) is not available. 

TheADAS modules are dedicated products which communicate mainly 
within the same hardware unit. Nevertheless, to adjust the algorithms in 
function of the vehiclestatus, it's necessary to providetheA DA S modules with 
some vehicle information as: speed, yaw rate, direction indicator status, etc. 

To managethe vehicle information acquisition and sending of the outputs, 
various communication interfaces are available, depending on the product, 
e.g. CAN or FlexRay communication interfaces. 

Thecommunication bandwidth requirements increase more and more with 
more and more complex applications, the existing network are not specified to 
cover the increasing demands for bandwidth, and the Ethernet price. Ethernet 
seems to be an alternative to the existing communication hardware. 


2.4.2.3 Definition of next generation interfaces 

The definition of next generation high speed sensor interfaces is the key 
to enable the improvement for next generation driver assistant systems. An 
optimized interface leads to optimized dataflow and system performance. For 
each sensor family (Camera/RA DAR) there is a dedicated interfacing needed. 


Parallel camera interface (CIF ) 

The Camera Interface (CIF) represents a complete video and still picture input 
interface transferring data from an image sensor into video memory. Further- 
more, several hardware blocks - performing image processing operations on 
the incoming data - are provided (Figure 2.14). 
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Figure2.14 Camera Interface (CIF) overview. 


A part from providing the physical interfacing to various types of camera 
sensor modules, the CIF block implements image processing and encoding 
functionalities. The integrated image processing unit supports image sensors 
with integrated Y CbCr processing. A dditionally, the CIF also supports the 
transfer of RAW (e.g. Bayer Pattern) images and non-frame synchronized 
data packets. The CIF block features a 16 bit parallel interface. All output 
data are transmitted via the memory interface to a BBB (Back Bone Bus) 
system using the master interface. Programming of the CIF ¡s done by register 
read/write transactions using a BBB slave interface. 

The CIF provides a sensor/camera interface for a wide variety of video 
applications and itis optimized for high speed data transmission under terms of 
low power consumption. This module is designed to be used for the following 
use cases: video capturing/encoding, still image capturing in Y CbCr with 
on-the-fly J PEG encoding and RAW frame data capturing. 

The CIF requires fast system memory for image storage in either planar, 
semi-planar or interleaved Y CbCr or RAW planar format or as JPEG com- 
pressed data. The ¡JPEG encoding engine should be able to generate a full 
JFIF 1.02 compliant JPEG file that can be displayed directly by any image 
viewer. Important Y CbCr formats - which are used for video compression 
(e.g. MPEG4) for instance - are supported. For on-the-fly encoding macro 
block line interrupts are generated to trigger video encoding. 


Serial RADAR interface (RIF) 

Analog-to-digital converter (A DC) sample rates have been increasing steadily 
for years to accommodate newer bandwidth-hungry applications in commu- 
nication, instrumentation, and consumer markets. Coupled with the need to 


2.5 Safety Standards and Certification Concepts 35 


digitize signals early in the signal chain to take advantage of digital signal 
processing techniques, this has motivated the development of high-speedA DC 
cores that can digitize at clock rates higher than 100 M Hz to 200 M Hz with 8 
to 12 bit resolution. 

In standalone converters, theA DC needs to be able to drive receiving logic 
and accompanying PCB trace capacitance. Current switching transients due 
to driving the load can couple back to the ADC analog front end, adversely 
affecting performance. One approach to minimize this effect has been to 
provide the output data at one-half the clock rate by multiplexing two output 
ports, reducing required edge rates, and increasing available settling time 
between switching instants. 


Use of LVDS for ADC high speed data output 

A new approach to providing high-speed data outputs while minimizing 
performance limitations in A DC applications is the use of LV DS (low voltage 
differential signaling). Infineon is incorporating LVDS output capability in 
new RF devices ADCs— and will include LVDS input capability in its new 
micro-controller designs. 


Standards 

Two standards have been written to define LV DS. Oneis theANSI/TIA/EIA- 
644 which is titled “Electrical Characteristics of Low Voltage Differential 
Signaling (LVDS) Interface Circuits.” The other is IEEE Standard 1596.3 
which is titled “IEEE Standard for Low-Voltage Differential Signals (LV DS) 
for Scalable Coherent Interface” (SCI). 


Generic interface to communicate between ADTF project 

and FPGA based hardware platform 

In order to allow an easy and standard communication between an ADTF- 
Project and the FPGA -based hardware platform, a generic interface is used. 
The generic interface realizes the communication with different processing 
elements implemented in the FPGA -based hardware platform transparent to 
the user. 


2.5 Safety Standards and Certification Concepts 


Some concepts related to modular certification have already been adopted by 
current standards and thus have found their way into the state of the practice. 
This is particularly true for the fields of automotive systems because the trend 
towards modularized architectures has been particularly strong in this field. 
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2.5.1 Safety Impact of DESERVE 


M odularization of a common A DAS platform comes with a clear impact on 
safety. Modules will interact, for example on Missed Trigger Interaction, 
Shared Trigger Interaction, Sequential Action Interaction and/or Looping 
Interaction. 

Module interaction implies that any change in operation of one mod- 
ule (feature) can be attributed in part or in whole to the presence of any 
other module (feature) in the operational environment, as illustrated in the 
Figure 2.15. 


2.5.2 Functional Safety of Road Vehicles (ISO 26262) 


The international standard ISO 26262 for the functional safety of street vehi- 
cles contains the so-called concept of Safety Element out of Context (SEooC). 


Decision Making (Services) 


Data Sources (Sensors) 


Figure 2.15 Module interaction implies changes in system behavior. 
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A SEooC is defined as a component for which there is no single predestinated 
application in a specific system. Therefore, the SEooC developer does not 
know the concrete role the product has to play in the safety concept. Sub- 
systems, hardware components, and software components may be developed 
as SEooCs. Typical software SEooCs are reusable, application independent 
components such as operating systems, libraries, or middleware in general. 

For SEooC development, the standard suggests specifying assumed safety 
requirements and developing the system according to these requirements. 
W hen the SEooC is to be used in a specific system, the system developer has 
to specify the demanded requirements, which can subsequently be checked 
against the assumed requirements. If there is a match between the demanded 
and the guaranteed (assumed) requirements, system and component are 
compatible. 

The standard does not provide any suggestions or methods on how to 
identify safety requirements such as to increase the chance that assumed 
and real requirements will actually match. The standard specifies a relatively 
coarse-grained process for embedding a SEooC development into the stan- 
dard's safety lifecycle. This approach deals with hierarchical modularization 
since it focuses on the SEooC's role as a sub-component of a system. 

In general, integration of the SE ooC is expected to be done at development 
time and thus thereis no explicit support for open systems where components 
are to beintegrated dynamically. 


2.5.3 Guidelines Related to ISO 26262 


ISO 26262 is a derivative of IEC 61508, the generic functional safety standard 
for electrical and electronic (E/E) systems. Ten volumes make up ISO 26262. 
It is designed for series production cars, and contains sections specific for 
management, concept and development phase, production, operation, service 
and decommission. 

ThelSO 26262 requires the application of a "functional safety approach", 
starting from the preliminary vehicle development phases and continuing 
throughout the whole product lifecycle. 

TheDESERVE projectfocuses on theconcept and development (at system, 
hardware and software level) phases of the lifecycle. During these phases, the 
main steps defined by the Standard are: 


Item definition: the Item has to be identified and described. To have a 
satisfactory understanding of the item, it is necessary to know about its 
functionality, interfaces, and any relevant environmental conditions. 
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Hazard analysis and risk assessment: to evaluate the risk associated 
with the item under safety analysis, a risk assessment is required. The risk 
assessment considers the functionality of the item and a relevant set of 
scenarios. This step produces the A SIL (Automotive Safety Integrity Level) 
level and the top level safety requirements. 

The ASIL is one of the key concepts in the ISO 26262. The intended 
functions of the system are analyzed with respect to possible hazards. The 
ASIL asks the question: “If a failure arises, what will happen to the driver and 
to associated road users?”. 

The risk of each hazardous event is evaluated on the basis of frequency of 
the situation (or “exposure”), impact of possible damage (or “severity”) and 
controllability. 

The ASIL level is standardized in the scale: QM: quality management, 
no-risk andA, B, C, D: increasing risk with D being the most demanding. The 
ASIL shall be determined without taking into account the technologies used 
in the system. Itis purely based on the harm to the driver and to the other road 
users. 

Identification of technical safety requirements: the top level safety 
requirements are detailed and allocated to system components. 

Identification of Software and Hardware safety requirements: The tech- 
nical safety requirements are divided into hardware and software safety 
requirements. The specification of the software safety requirements consid- 
ers constraints of the hardware and the impact of these constraints on the 
software. 

To take into account the functional safety approach, the DESERVE 
applications should consider the application of the following main points: 
analyze risk early in the development process; establish the appropriate 
safety requirements and consider these requirements in software and hardware 
development. 

The impact of the standard is different for the development of warning 
functions, control functions or automated driving functions. 


2.5.4 Safety and AUTOSAR 


In the automotive domain, Ostberg and Bengtsson [14] propose an extension 
to AUTomotive Open System Architecture (AUTOSAR) which consists of a 
safety manager that actively enforces the safety rules described in dynamic 
safety contracts. Their main contribution is a conceptual model of safety 
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architecture suitable for runtime based safety assessment. Openness and 
A daptivity were both addressed. 

Also in the automotive domain, Frtunikj et al. [15] present a runtime 
qualitative safety assessment that considers A utomotive Safety Integrity Level 
(A SIL) and its decompositions in open automotive systems. In their solution, 
the authors consider the modularization of safety-assessment using Safety 
Elements out of Context (SEooC) from ISO 26262. In their approach, the 
SE ooC was extended and the safety-assessment is done at runtime by a Safety 
M anager component. 


2.5.5 Safety Mechanis ms for DESERVE Platform 


As an example, this paragraph summarizes some features of the safety 
mechanisms that are available by Infineon's multi-core platform AURIX 
which represents a potential instance of DESERVE platform (development 
level 3). Its safety documentation includes: 


e Safety case report providing the arguments with evidence that the objec- 
tives of the ISO 26262 and the safety requirements for a component are 
complete and satisfactory. 

e FMEDA (customer and Infineon proprietary document) 

e Safety manual including an overview of the assumed application use 
cases and guidance for the application level, a summary of safety features 
and mechanisms and their recommended use as well as the summary of 
achieved safety metrics and resulting A SIL compliance [13]. 


The AURIX microcontroller platform is developed as a SEooC (Safety 
Element out of Context) and provides the safety mechanisms summarized 
in Figure 2.16. It provides a Safe Computation Backbone compliant with 
ISO 26262 ASIL D (this includes Single Point Fault M etric fully supported 
by HW mechanisms and Latent Fault Metric supported by SW (SafeTlib), 
Logic M IST, M BIST). Support criteria for coexistence of elements are enabled 
through a layered protection system (covering CPU tasks, Shared M emories, 
Peripherals), CPU supervisor/user privileges, Safety Task A ttribute and a rich 
set of counters & watchdogs for program flow & temporal monitoring. SEooC 
deliverables are the Safety Library (SafeTlib), Safety M anual to support 
SEooC integration and FMEDA to support computation of the ISO 26262 
Metrics. 

Top Level Safety Requirements (TLSR) related to the Microcontroller 
I/O sub-system are specified by the system integrator, as these vary for 
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Figure 2.16 SEooC safety mechanisms. 


each application. TLSR1 (ASIL D) requires to avoid false output of the 
microcontroller for longer than the FTTI (Fault Tolerance Time Interval, 
Figure 2.17), while TLSR2 (ASIL B) only require to avoid unavailability 
of asafety mechanism for longer than one driving cycle. 

The Fault Tolerant Time Interval is more precisely defined by Figure 2.18. 
The application dependent fault detection time worst case is the diagnostic 
time interval. The fault detection time depends on the safety mechanism. The 
fault reaction time is the sum of failure signaling time and failure reaction 
time. Failure signaling time depends on the microcontroller architecture, while 
failure reaction time depends on the application. The failure signaling time is 
composed by the alarm forwarding time plus the alarm processing time plus 
the failure signaling time. 


Safety requirements 

With the AURIX as basis for DESERVE platform realization, it fulfils the 
targets according to ISO 26262-5, 8.4.5, which defines requirements for ISO 
26262 metrics. To achieveA SIL D, for instance, the single point failure metric 
(SPFM) needs to reach minimum 99% and the latent fault metric (LFM) 
needs to reach 9096 or above. The minimum values of SPFM and LFM shall 
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be reached by every vital part. The SPFM threshold levels shall be reached 
both for permanent and for transient faults. For a given ADAS application 
SPFM, LFM and PMHF (probabilistic metric related to hardware failures) 
metrics are estimated based on the vital, critical and application-dependent 
parts utilization. 
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In terms of PMHF forA SIL D safety goal, ISO 26262-5 requires a metric 
of less than 10 FIT (failure in time, referring to 10°9 hours). ISO 26262-5 
9.4.3.6 and 9.4.3.7 specify the relationship between ASIL and FCR and DC 
(Residual Faults). To meetA SIL D requirements the diagnostic coverage for a 
FCR5 part shall be > 99,99%, The safety mechanisms are designed to achieve 
coverage of 99.99%. 


Safety architecture 

The safety architecture goal is to provide a safe computation platform for 
up to ASIL D safety applications according to ISO 26262, as this ASIL 
level is required for most next generation ADAS. To achieve this level, safe 
computation hardware and software, safe operating system as well as safe 
software architectures are required. 

The generic elements (vital parts) of a safe computation hardware platform 
are summarized in Figure 2.19. Safe CPU requires hardware redundancy, 
realized by delayed lockstep CPU with enhanced timing and design diversity. 
Safe SR A M s allows information redundancy (realized by standard SECDED 
ECC, address signatures). Also safe Flash memory is needed for information 
redundancy (realized by an enhanced ECC with more than 99% coverage 
of arbitrary multiple-bit fault). Enhanced error detection codes for covering 
data € addressing faults lead to safe interconnects and support informa- 
tion redundancy. The clock system frequency range monitors using internal 
high precision independent clock source, internal € external watchdogs. 


Figure 2.19 Generic elements of safe computation hardware platform. 
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Finally power supply range monitoring is implemented for the internal 
regulators. 

To achieve a safe computation software platform an ASIL D compliant 
operating system needs to be used featuring memory protection and time 
protection. Further it needs to provide services for program flow monitoring, 
end-to-end communication safety protocols as well as safe interrupt vector 
generation. A SIL D compliant software is required to be developed according 
to ISO 26262 part 6. 

The AURIX platform ensures freedom of interference at software level 
by means of SW isolation, while freedom of interference at hardware level 
is guaranteed by HW isolation. The CPU MPU (memory protection unit) 
monitors the direct access to the local memories, applies to software tasks and 
allows dynamic re-configuration. The bus M PU monitors the SRAM accesses 
viainterconnect. Finally register access protection monitors write access rights 
to module registers. 
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3.1 Introduction 


Traffic simulations become more and more relevant for the development of 
Advanced Driver Assistant Systems (A DAS) and algorithms for automated 
driving. They are used to evaluate the functions concerning important impact 
factors like safety, efficiency, mobility or costs. Therefore, the system is 
tested and evaluated as a component of the virtual vehicle in simulations. 
The factors manageability and acceptance of the users regarding the tested 
system are prospected and evaluated in driving simulators, where the real 
driver can be part of the virtual environment. Both, in traffic simulations 
and in simulators, the realistic behaviour of the surrounding virtual road 
users to the equipped vehicle is an important requirement for a suitable 
evaluation of the system because this behaviour influences the reaction of 
ADAS and driver significantly. M oreover, it is necessary, that the behaviour 
of the traffic can be adjusted systematically in order to generate defined traffic 
situations of relevant constellations and in different nuances of criticality. 
As in real traffic, small changes in the initial conditions can produce a 
large difference in the result, This phenomenon can only be reproduced in a 
simulation if the driving behaviour patterns reflect the human driver behaviour 
closely. 

The basis of this driver model and its possible functionality or ability 
is the underlying simulation environment. To determine the risk of conges- 
tion for example, a traffic simulation environment with macroscopic, e.g., 
fluid dynamic based traffic behaviour, is suitable. The easiest macroscopic 
representation of virtual traffic could be an equation with the result of an 
average velocity dependent on the density of traffic. This might be a complex 
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mathematical relation producing suitable results for some purposes but it is 
impossibleto understand the specific inner traffic effects like congestion waves 
and traffic collapses. For such effects, the influences of the traffic elements on 
the driver models have to be understood. 

These are basically the interactions between the driver-vehicle-units 
among each other and the reactions of the units to the traffic environment like 
traffic light systems or the road curvature. In this kind of traffic simulation, 
called microscopic traffic simulation, the desired controlling reaction of the 
driver or the automated function is calculated and implemented directly into 
the vehicle. This is done in form of a change of the dynamic state of the 
vehicle, e.g., a desired acceleration, which consequently results a change of 
velocity and position. The driver and the vehicle represent an inseparable unit, 
but entirely with a unit-specific behaviour. The behaviour might respect some 
dynamic restrictions of the vehicle and in some cases of the driver, but does 
not depict the driver-vehicle-interaction. 

For the analysis of modern ADAS this kind of simulation is not suitable, 
as a driver has, e.g., to be able to override the system by using the control 
elements, like pedals, steering wheel or switches. A n A CC for example can be 
switched off in critical situation by using the brake pedal or can be overridden 
by using the accelerator pedal to further increase or keep the acceleration. 
These effects can only be simulated if vehicle and driver are implemented as 
separate models and if the interfaces between driver model and vehicle model 
are used to implement the driver's wish to the vehicle. Thus, this concept can 
be called sub-microscopic or nanoscopic. 

Another specific application for sub-microscopic traffic simulations is the 
exploration of detailed effects related to the vehicle, like fuel consumption 
analysis in specific traffic situations or environments. Within these analyses a 
very detailed vehicle model is needed. B utitis not only the specific application 
which letus chose a higher level of traffic simulation. Obviously, the higher the 
level of detail, the more effects can be depicted with a single traffic simulation 
environment and model set-up but at the expense of computing time up to the 
loss of the real-time capability. A dditionally, the effort of setting up the models 
increases due to the increase of model parameters. For the same reason the 
validation of the models is much more complex, too. 

In the past decades many driver models where developed with special 
focuses on different specific elements of the driving task. Some try to show 
an optimal behaviour, without taking into account the physical and cognitive 
abilities and limitations of the human driver. Others focus on these restrictions 
or on the information process in the driver's brain and body and the capability 
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of the driver to process different information in parallel. In literature many 
categories of driver models are published. ] úrgensohn defines in [1] two basic 
categories of driver models, formal and non-formal models. Formal models 
have a fixed description but a changeable inner value. The result of formal 
models is reproducible, that means, the same conditions lead to the same 
output. Non-formal models are not described by those fixed dependencies 
(like equations or lingual definition) or they have a non-changeable (constant) 
character. Examples of formal models are descriptive models, which have a 
fixed description but have a character which is not defined by an input-output 
structure. In the European research project A SPECSS [2] and in Deliverable 
D3.1.1[3] of the DESERVE project the definition is different. In these sources 
descriptive models are clearly defined (fixed, but not constant) and generate 
a numeric, quantitative output dependent on different numerical influences. 
This output is reproducible but can anyway contain stochastic elements. 
Functional models describe physical and psychological aspects of driving, like 
the information processes, the human structure of thinking and acting. They 
do not generate a numeric output but draw a picture of the elements of driving. 
The difference between functional and descriptive models in this definition 
is not unique and not complete; there are hybrid models and models which 
can't be matched to any of these categories. In this chapter, the distinction 
between formal and functional models is used to avoid the conflict of the two 
definitions of descriptive models. 

In complex traffic simulations the usage of both kinds of models is 
needed to depict realistic traffic flow and driving behaviour. Formal models 
describe algorithms for a driver model how to reach its goal by setting 
defined reference values dependent on the input. Functional models can help to 
understand the driver's wishes and to create an eligible structure and decision 
algorithm. 

IntheDESERVE project, a rapid prototyping platform for the development 
of ADAS was created and a suitable tool-chain for the development process 
was outlined. The traffic simulation is an important tool in the development 
process of ADAS and thus is part of the DESERVE tool-chain. As described 
above, a realistic driver model is needed for the development and evaluation 
of modern ADAS. In the next sections, the way of modelling the driving 
behaviour is described, followed by the requirements for the DESERVE driver 
model. On the basis of the requirements the structure of a sophisticated driver 
model is developed and the used implementation techniques and strategies are 
explained. In the last section two different applications of the driver models 
are presented. 
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3.2 Driver Modelling 


Driving is not just a single decision and a single action at once. It is rather a 
complex interoperation of different motivations, perceptions, decisions and 
states with continuous and discrete changes. To create a realistic driver 
model, a strict delimitation between these elements has to be done and it 
is helpful to create a suitable structure with a unique and logical naming of 
the elements and well-defined interfaces. To develop such a structure, driving 
has to be analysed on the basis of typical driving scenarios, manoeuvres and 
actions. 

Besides the perception and the handling or action, the information 
processing is the most important part of driving. Within the information 
processing, the driver estimates desired values for different future vehi- 
cle states he wants to achieve, like a desired speed, a desired following 
distance, and distance to stop. These inner desired states are called driver- 
variables or briefly variables. Often a driver has multiple desired values 
for the same variable, generated by different motivations, between which 
a decision is needed. As an example the desired speed shall be used: The 
driver can have multiple causes of choosing a desired speed. For example 
the following three: First, to reach the destination as soon as possible. 
Second, the speed limits on the road. Third, the curvature of the road 
combined with the need for safety. For each motivation, a desired speed can be 
determined. The speed limit for the first mentioned motivation is the maximum 
speed the driver would choose on a free, straight road. If there are no further 
influences like other road-users or speed limits, the driver would travel with 
this speed. Situations, which do not allow travelling with this speed, do not 
imply that it is not the driver’s wish (the driver wants to, but can’t). For the 
second motivation, a speed in an interval around the speed limit, dependent 
on the law-abiding is desired. This can be higher or lower or exactly the speed 
limit. The third motivation results in a desired speed which allows the driver 
to pass a curve in a comfortable and safe manner. 

All described motivations lead to different speeds, so the driver is in a 
dilemma: She/he has to decide for one speed to accelerate or decelerate to. 
The decision in this case is taken in a pragmatic way: The lowest speed wins, 
because on the one hand there is a comfort and safety limit, on the other hand 
there is a limit because the driver accepts the given speed limits or at least 
wants to avoid fees for driving too fast. 

The described example shows two input types to the driving behaviour, 
the driver's character (here: need for safety, need for comfort and law-abiding) 
and the current situation described by the state of the own vehicle and other 
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vehicles as well as the road and environmental structure. M oreover, not only 
the local situation influences driving. A good driver reacts before approaching 
to a discrete situation to reach the desired value in time. In the curve speed 
example above, a real driver would estimate the comfortable and safe speed 
based on the visual perception of the road's curvature before reaching the 
curve, On that perception, the driver decelerates with a rate which leads to 
the desired speed at the moment the curve is reached. Within the curve the 
driver corrects this estimation to satisfy the desired safety and comfort. The 
predictive behaviour is called anticipatory driving. The correction is called 
compensatory driving [4]. This phenomenon also has to be regarded in the 
development of driver models. 

Of course the driver has more responsibilities than the decision of the 
desired speed. According to Rasmussen [5], the driving task can be seen in 
three levels: The strategic level where the driver plans and creates strategic 
values like a route, the manoeuvring level, where the driver processes the 
decisions and determines desired values and value sequences, the strategy can 
be implemented with. This behaviour is conscious: The driver knows exactly 
how to solve the driving task and creates a strategy. The driver is able to 
reflect decisions and actions he/she took in this level. In the control level 
the driver implements these conscious values into the vehicle by using the 
steering wheel, the accelerator and brake pedal and other control elements 
of the vehicle. This operation is not done in a single step. Often the driver 
determines a subconsciously desired value, like a desired acceleration, which 
is then transferred into the actual vehicle input. This value is not reflected 
by an experienced driver. It is an automatism by the driver to reach the 
conscious desired value. The desired speed shall be used for an illustration: 
After the decision to move freely, because no other road-user is influencing 
the driver, the desired speed is detected, which is a conscious value. To 
reach this speed, the driver accelerates with the desired acceleration, which 
is a subconscious value because the driver cannot quantify this value and it 
is not part of the strategy. The final implementation is done by using the 
vehicle’s controls to reach this acceleration. The advantage of using this 
subconscious step is that the regarded values can be set, manipulated and 
limited dependent on realistic driver’s needs independent of the conscious 
behaviour. Often the desired acceleration and yaw rate or curvature is used 
as an output of macroscopic driver models. In this definition these variables 
represent subconscious variables. Thus, without the implementation by using 
steering wheel and pedals, the model can be seen as a macroscopic driver 
model. 
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3.3 Requirements for DESERVE 


Before creating a driver model, an analysis of the requirements for this model 
based on the field of application has to be done. In DESERVE, a rapid 
prototyping platform and development process has been created. The details of 
the platform can be found in Chapter 2. This requirements section concentrates 
on the applications of the DESERVE platform. In the first year of the project, 
the needs for the driver model were analysed in D 3.1.1 [3]. There are two kinds 
of driver models identified in the project: the virtual driver for the usage in 
traffic simulations like described above and the driver intention and distraction 
model, which is used as a component of anADAS to detect the real driver’s 
state. 

The literature review, the analysis of existing driver model concepts and 
in particular the research work in the DESERVE project shows that it is not 
possible to create one holistic driver model to satisfy all scientific needs. 
Nevertheless it would be very attractive, if there was one basic structure 
combining the ideas of the previous research, in which the algorithms can be 
added as independent modules. T he connections of all modules - with properly 
defined affiliation and interfaces and in conjunction with a suitable parameter 
set - will produce the expected results. For that reason, a generic module- 
based structure needs to be developed which is well-defined and flexible 
for amendments. M ost of the integrated algorithms can be used for several 
applications while others are specific to one. The generic structure should fit 
to all applications of driver modelling in an open way. 

Another important issue is the implementation. M any driver models are 
implemented in native programming languages. This fact has a significant 
disadvantage: It becomes very muddled due to the one dimensional struc- 
ture of programming code. Often driver model structures are shown in a 
two dimensional representation with levels in the up-down dimension and 
sequence of the information processing in the left-right direction (time related). 
An implementation of the driver model in an analogous structure could be 
very helpful to create a clear and well-arranged model. Thus, a graphical 
implementation would be aspired. Furthermore, it should be possible to 
structure or capsulate the content properly as well as the definition of the 
interfaces to take the advantage of modern programming techniques like 
object oriented programming or code reuse to avoid redundancy. Next to 
the structural requirements, the system shall be able to hold values or states 
over one or more time steps to implement the memory of the driver. A nother 
requirement is the possibility to connect the driver model to the traffic 
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simulation environment. This can be done by communication interfaces or by 
the native integration of the compiled driver model, for example as a dynamic 
linked library or similar techniques. 

The driver model (virtual driver) in DESERVE shall be used in different 
traffic simulation environments for testing and evaluating ADAS functions in 
the process of the development. Within the project, the driver model shall be 
implemented and tested for a control function which is designed to show the 
advantages and benefits of the DESERVE platform. Therefore, an Advanced 
Cruise Control system (ACC) is combined with a Heading Control (HC). 
The system shall assist the driver on inter-urban road scenarios and increase 
the safety within the full speed range (WP 4.2, [6, 7]). The decision for 
demonstrating the system for the inter-urban area is made, because this area 
is a very important research field for the usage of ADA S functions of the next 
generation; especially those who reach the next level of driving automation 
(cf. SAE automation level 2 - partial automation, [8]). Also the evaluation 
of ADAS for the increase of safety is important in the inter-urban area. 
Therefore, detailed driver models are needed with the claim to be valid for the 
intended purpose. In particular, the modelling of realistic human behaviour 
on intersections and junctions is one of the most important developments for 
today's traffic simulations in order to develop ADAS with the goal to reduce 
the high number of accidents on intersections. 

A nalysing the application in DESERVE, the driver model requirements 
can be briefly defined: 


e Inter-urban driving behaviour including safe-passing of slow, right- 
moving vehicles has to be implemented. 

e The driver model needs the capability of route-following within multi- 
lane roads and complex but flexible transport networks. 

e Full intersection and traffic light behaviour has to be implemented. 

e Anticipatory driving behaviour, like early speed adaption needs to be 
reflected. 

e Re-use of validated driving behaviour algorithms and driver model 
approaches is required. 


The driver model is implemented and connected to the simulation environment 
PELOPS [9]. The inter-urban A CC and HC developed in DESERVE is tested 
in virtual traffic scenarios containing units controlled by the here described 
driver model. These scenarios include straight and curvy multi-lane roads, 
complex intersections with traffic lights and right-of-the-way controls by 
signs and structure, different speed limits, rare and dense traffic with different 
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parameterisations and slow moving vehicles (e.g. mopeds). This testing set- 
up leads to a set of manoeuvres and primary driving tasks which have to be 
implemented: 


Longitudinal Lateral 
Free moving Lane keeping 
Approaching/Following Curve cutting 


Braking in critical situations 
Figure 3.1 Primary driving tasks which are implemented in the driver model within the 
DESERVE project separated by longitudinal and lateral control. 


Longitudinal Lateral 
Stopping, Standing and starting 
Turning on intersections 
Lane change 


Safe passing 


Figure 3.2 Manoeuvres which are implemented in the driver model within the DESERVE 
project. 


There are several other manoeuvres which can be implemented like U- 
turning or stopping on the road side. These manoeuvres are not implemented 
within DESERVE. Nevertheless, the structure of the model shall offer the 
possibility to enhance the functionality. 


3.4 Generic Structure 


In this chapter the ika driver model is introduced. Within the DESERVE 
project, a suitable and generic driver model structure was developed and 
implemented which fulfils the requirements from the previous section. The 
interfaces and driver parameters are defined and described in this chapter. 


3.4.1 Model Structure 


From literature review, two generic structures can be identified: The three 
levelsof driving by Rasmussen and thethree blocks of perception, information 
processing and action, which can be found in several formal and non-formal 
model approaches (e.g. [10]). This leads to a matrix-form model shown 
in Figure 3.3. The modules (blue boxes) in the matrix represent model 
implementations or parts of those. The arrows, in different shades of grey, 
describe the information flow between the blocks and represent the internal 
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Figure 3.3 Driver model structure in the context of environment and vehicle: the structure 
includes perception, processing and action blocks including its functional modules and the 
regarded dynamic information flow. 


interfaces. The blue arrows show the information flow through the three 
levels and represent the needed information (variables) for the driving tasks 
and manoeuvres. A central functional block of the model is the State block, 
where the driver-specific values are stored. The M emory module represents 
the driver's knowledge about the current situation, the manoeuvre states, the 
destination or route, etc. The memory is used to keep information for the 
following time steps, during the manoeuvre or for the whole simulation cycle. 
This information can be extrapolated to estimate current states of the ego- 
vehicle or other road-user even if the driver model does not sense the regarded 
information at the current time step. T hus, the memory has an interface to the 
Perception block and constitutes an input of this block besides the inputs of the 
environment and the vehicle. Current manoeuvre states and important values, 
which haveto beknown inthe nexttime step, are also saved in the memory and 
are passed by the interface between the State and the P rocessing block where 
the driving calculation is implemented. The parameters represent the driver's 
character and are defined in two layers: qualitative and physical parameters 
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(see Subsection 3.4.2). The Parameters block serves ¡ts values to all blocks 
of the driver model, for example by manipulating the handling time delay 
(reaction time). The Action block controls the handling or the conversion of 
the driver's wish into physical actions like the manipulation of the pedals, the 
steering wheel, shifting and using the HM control elements. 

As it can be seen in Figure 3.3, a strict assignment of all modules to a 
unique level is not possible. In the following the modules shall be explained 
in detail. 

In the Planning module, route specific calculations are executed. In 
general, the units have a fixed route calculated or set in the initialisation of 
the simulation. In reality a driver changes the route under circumstances, e.g., 
traffic jams or road blocks. If such functionalities are needed, appropriate 
algorithms can be implemented in the Planning module. In the current 
implementation the route is stored in the memory. The Planning module 
calculates a valuefor each lanein the environment around the unit, which gives 
a quantitative value of how far the lane and its successors can be followed 
on the given route. Thus, the M anoeuvre Decision module can decide which 
lane the driver wants to take. The M anoeuvre Decision module processes all 
discrete manoeuvres and discrete decisions. That means on the one hand to 
decide for a manoeuvre and on the other hand to control the manoeuvre but 
not to calculate the related Guidance Values. The Decision module returns 
different states within the manoeuvre and process variables, which can be 
used by the following modules to perform the manoeuvre (in the figure briefl y 
named M anoeuvre). An example is described in Section 3.5. Another output 
of the module is a set of Discrete Secondary Actions which are needed or 
desired at the beginning or during the manoeuvre. This can be for example 
switching the turning indicators in case of turning or lane changes. On the basis 
of the decision with ¡ts states and process values, a local strategy to perform 
these manoeuvres and continuous driving tasks is calculated in the Conscious 
Guidance module. Continuous driving tasks are performed during the whole 
simulation time without the need of a discrete decision. Of course, the output 
values of these tasks can be overridden by other results. An example is the 
motivation to keep the lane: This task is continuous because the driver always 
wants to stay in the lane but can be forced to leave the lane during an overtaking 
manoeuvre. Within the Conscious Guidance module the Guidance Variables 
are filled with values (guidance values), which the driver wants to reach. 
An example was given in Section 3.2 (desired speed during free moving). 
Several guidance values are calculated and passed to the Subconscious Stabil- 
isation module. Within this module, desired stabilisation values are calculated. 


3.4 Generic Structure 55 


In general, these values are the desired acceleration and the desired yaw rate 
for the longitudinal and lateral control respectively. Based on all motivations 
the stabilisation value with the highest benefit for the driver is taken. B esides 
the desired values, some real physical values, which are states of the vehicle, 
can be directly sensed by the driver. Thus, the driver is able to implement 
these values subconsciously by using the vehicle control elements (pedals 
and steering wheel). This implementation is done in the Continuous Primary 
Actions module. 

To define the interfaces between the modules it is helpful to create a 
manoeuvre and driving task table. For the DESERVE implementation the 
following tables (Figure 3.4 and Figure 3.5) were developed, derived from 
Figure 3.1 and Figure 3.2. 

In the motivation of free moving, the desired velocity of the driver 
is calculated. This velocity depends on the speed limit, the curvature 
of the road ahead and the maximum desired velocity of the driver. To 
reach the velocity, the driver model accelerates (subconsciously) depen- 
dent on the current velocity and the desired velocity. A suitable model 
approach is part of the Intelligent Driver Model (IDM) by Treiber, Hen- 
necke and Helbing in [11]. An adaption of that approach for the usage in 
complex driving simulations is published in [12]. The following motivation 
is mainly influenced by a desired following distance which bases on a driver 
specific following time gap. To reach this distance the driver needs to accel- 
erate or decelerate. The lane-keeping is performed by the usage of fix-points 
based on the Two-Point Visual Control M odel published in [13]. This model 
can be adapted, so that the fix-points cause a yaw rate, which the driver wants 
to implement. T he adaption is published in [14]. The yaw rate is chosen as the 
desired subconscious stabilisation value because it physically implies both, 
the curvature and the velocity. During standing, the driver model maintains a 
brake pedal value which results in a vehicle that does not move. This means 
that the pedal value is a subconscious value, different to the other longitudinal 
tasks. 


Motivation Conscious Subconscious Action 

Free moving Velocity Acceleration Pedal value 
Following Distance Acceleration Pedal value 

Lane keeping Fix-points Yaw rate Steering wh. angle 
Standing - Pedal value - 


Figure 3.4 Process variables for the four basic driving motivations free moving, following, 
lane keeping and standing. 
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Manoeuvre Conscious Subconscious Action 

Lane change 1. Lat. offset 1. Lat. velocity Steering wheel 
(Two-phase lc.)  2.Fixpoints 2. Yaw rate angle 
Stopping Stop point Acceleration Pedal value 
Safe passing Fix-points Yaw rate Steering wheel 


angle 


Figure 3.5 Process variables for the three manoeuvres lane change, stopping and Safe 
Passing. 


In Figure 3.5, the manoeuvre turning is missing. In this model turning is 
implemented in the decision module, at least to control the turning indicators, 
but does not require a process implementation due to the given features: A 
lateral and longitudinal turning manoeuvre can be seen as a ‘normal’ street 
following motivation if the turning path is known and a turning speed is 
calculated by the given curvature. In the case of conflicts with ‘right of way’ 
road-users (e.g. at left turns), the driver model stops with the manoeuvre 
stopping. If the conflict is resolved, the stop manoeuvre ¡s aborted, so the 
driver model switches to free moving or following. 

The perception is partly done in the simulation environment: A II perceived 
information is transformed to the driver's coordinate system by the simulation 
environment. The driver model adapts the information with driver specific per- 
ception errors, like perception limits, continuous noise, sporadic disturbances 
or fluctuations and accuracy limits. 


3.4.2 Parameter Structure 


In many driver model approaches, physical parameters are used to influence 
the driver behaviour and generate heterogeneous or driver specific results 
like in the IDM [11]. Examples of physical parameters are the maximum 
comfortable acceleration and deceleration or a constant following time gap to 
the leading vehicle. These parameters are well measurable for a single driver 
or a group of drivers, represent a direct input to the model approaches and 
are mostly independent of each other. To describe the character of a driver, 
a big set of physical parameters has to be defined. In other driver models 
humanised parameters on a higher level are used which are not directly 
measurable. These parameters have a meaning which can be described as 
a characteristic or a constant attribute of a human driver. In general, the 
parameters are used to generate driver specific physical parameters, which 
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are then dependent on each other by this humanised characteristic. With 
these parameters a characterisation of the driver is easier because the number 
of parameters is reduced to a smaller number. The challenge is to create 
a mathematical dependency which returns realistic results based on these 
fictive parameters. The humanised parameters used in the driver model for 
the DESERVE platform are named sportiness, need for safety, law-abiding 
and estimation ability. However, these parameters have no scientific physical 
or psychological meaning; they only represent groups of drivers and influence 
the underlying parameter block of physical parameters like desired following 
time gap, acceleration profile and many more. In Figure 3.6, the parameter 
concept of the DESERVE platform is shown: In the first block, the humanised 
parameters are shown. T hese parameters influence the physical parameters of 
the driver model. In this example, the need for safety parameter influences the 
lower and upper following time gap (see [15]) and the acceleration profile of 
the driver model. Parameters are not influenced by the dynamic inputs. 

The set-up of a suitable parameter concept influencing all models in a 
realistic way is difficult and extremely dependent on the implemented model 
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Figure 3.6 Sketch of the parameter blocks (brown) and model blocks (blue) of the driver 
model. 
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approaches. A concept to solve this problem could be to measure a large set 
of reference data and run an optimization to find the best fitting parameters. 
After that a validation has to be done with another set of data to prove the 
concept. 

To create a traceable connection between the parameter blocks, in the 
DESERVE model, cubic polynomial functions are used. In areview of floating 
car data for example, the distribution of lower following time gaps of the 
Wiedemann model was generated. Basis of the distribution of these time gaps 
is a Gaussian distribution of the need for safety parameter with u = 0.5 and 
o =0.15 as described in [15]. With the polynomial 


ATiower (PNFS) = 1.4 - prs + 0-9: pNps + 0-9 : PNES (3,1) 
with 
ATiower: Lower following time gap [s] 
purs: Need for safety; Gaussian distributed (0.5, 0.15) [-], 


Probability of appearance [%] 


0 0.5 1 1.5 2 2.5 3 36 
Lower fllowing time gap [s] 


Figure 3.7 Distribution of lower following time gaps for real drivers (blue bars) and the 
modelled distribution dependent on a normal distributed need for safety parameter (red line). 
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the distribution of the lower following time gap returns a result shown as red 
curve in Figure 3.7. The blue bars show the floating car data which is the basis 
of the polynomial curve in this example. 

This principle can be used and optimised analogously for the other physical 
parameters. 


3.5 Implementation 


The graphical programming tool M atlab/Simulink provides the implementa- 
tion features described in the requirements in Section 3.3. The 2D graphical 
GUI allows a clear and well -arranged implementation close to the visual struc- 
ture of the model. The implementation is easy to understand and easy to debug. 
In the university environment, many students and scientific assistance work 
with the driver model for a limited time range (e.g. Bachelor/master theses or 
PhD theses). Thus, a further important requirement is the comprehensibility 
of the model. Programming in Simulink is easy to learn also without deep 
knowledge of classic programming languages. The code can be capsulated in 
subsystems with defined inputs and outputs and several storage concepts can 
be used to implement the driver's memory. The data connection between the 
model and other tools can be established by using UDP or TCP/IP or other 
versatile techniques. 

For the DESERVE example implementation, PELOPS is used as the 
simulation kernel with the support of environmental structures (road network, 
traffic lights, etc.) and vehicle models. The core of the new version of PELOPS 
is implemented in Java. The integration of a Simulink model is possible 
with the UPD communication interface. For the simulation of one vehicle 
this solution is suitable and is real-time capable in the current version of 
the ika driver model and PELOPS. If multiple vehicles use the same driver 
model instance with their specific inputs, atleast time-dependent and memory- 
containing modules do not work properly. For the simulation of at least two 
vehicles, the Simulink-model needs to be duplicated to have an independent 
copy (second instance) of the driver model. This becomes difficult for a high 
or flexible number of vehicles in a simulation. A nother problem is the high 
execution time due to the UDP connection and the Simulink model itself. A 
native execution combined with direct data exchange, e.g. by shared memory, 
is much faster. The M atlab/Simulink tool-chain brings the possibility of code 
generation: The desired model can be converted to C or C++ code which 
can be integrated in other C/C++ or FORTRAN code or can be compiled 
to a shared library in almost all computing platforms. In DESERVE this 
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solution is used to integrate the driver model into PELOPS. For that purpose, 
a class wrapper is used around the generated code. T hat allows the simulation 
environment to create almost infinite numbers of independent driver model 
instances. Multiple test cases have been performed to show the capability of 
running traffic simulations with the full functionality of the driver models and 
a large number of traffic units in real time. 

Except for the decision module, all modules are implemented in standard 
Simulink subsystems with mathematical blocks. T hedecision moduleisimple- 
mented in Stateflow, which is an integrated Simulink feature. Stateflow allows 
implementing state machines, which is a suitable implementation technique 
for discrete decision structures. To demonstrate a possible implementation of a 
manoeuvre decision the lane change shall be used as an example: In Figure 3.8, 
a state machine implementation is shown for a lane change decision including 
the progress and sequence control. The progress describes the state or the 
‘position’ in the lane change like initialisation (init), origin lane, lane crossing 
(LC), target lane and termination (term). T he phases describe the phase control 
of the lane change by the driver. In this example the driver uses two phases 
to perform the lane change: In the first phase the driver accelerates laterally 
to a desired lateral velocity (anticipatory) dependent on the lateral offset. In 
the second phase, the driver 'switches' to the lane-keeping mode with the 
focus on the target lane (compensatory) by using the fix-point approach (see 
Figure 3.5). Dependent on the phase and the progress, the conscious guidance 
module, calculates the reference values which are needed to steer the vehicle 
to the desired lane. The transition A denotes the decision to perform the lane 
change, which is valid if there is a lane next to the ego driving path with 


Figure 3.8 Stateflow model for a two-phase lane change including decision (A), progress 
control (B) and sequence control (C). 


3.6 Applications in DESERVE and Results 61 


higher correlation to the route and some other conditions, like distance to the 
end of the lane, preference lane and a hysteresis. The basis for the decision is 
described in [16]. A decision for a lane change does not mean an immediate 
reaction. The driver model can decide before the lane or the desired gap is 
reached. |n the case of a positive decision, the lane change is initialized. This 
is a continuous process as long as the active lane change is not started. The 
transitions B control the progress of the lane change and transition C represents 
the transition from the first phase to the second one in this example. 


3.6 Applications in DESERVE and Results 


Within the DESERVE project, the driver model was used for two different 
applications: The validation of left turn simulations within the full parameter 
range and the prediction of a real driver regarding the acceleration during free 
driving, approaching and following. 

For the validation of left turn simulations (in this example without 
stopping), real traffic data from laser scanners were used to measure the 
trajectories of 136 vehicles on a junction in Alsdorf, close to Aachen in 
Germany. Figure 3.9 shows the results of the simulations for different 
parameter sets (coloured curves). The measured real -driver data are shown 
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Figure3.9 Trajectories (velocity over x- and y-position) for leftturn including the simulation 
results for different parameter sets. The real driver data is measured on one intersection with 
136 different drivers during day time. 
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in grey and the boundaries (extreme driver) as well as the average driver 
are included. The extreme drivers are generated by choosing respectively, 
the maximum and the minimum, of the need-for-safety and law-abiding 
parameters. For this example, the upper extreme driver is created by setting 
the need-for-safety and the law-abiding parameters to zero and the lower 
extreme driver is created by setting these parameters to one. As it can also 
be seen in the figure, the law-abiding parameter influences the speed the 
driver reaches before and after passing the intersection but not the velocity 
during the turning (red lines). Opposite to this, the need-for-safety parameter 
influences the speed within the turning only (blue line). This result depicts 
the statement that the turning speed is mainly driven by the safety and 
comfort motivations of the driver and the speed on straight roads is defined 
by the acceptance of speed limits. The phases between approaching and 
turning are representing a mixture of all motivations and result in a transition 
of the speed. In this example, the other parameters are set to the average 
value (0.5). 

To predict the driving behaviour of a real driver in a vehicle, the driver 
model was integrated as a module on a real-time system in the car, equipped 
with real sensor data by radar and camera sensors. A five second simulation 
is calculated in each prediction step and the result is written to the CAN -B us. 
With that data ADA S like ACC can react dependent on the estimated wish of 
the driver. The system and the results are published in [12]. 


3.7 Conclusions and Outlook 


In the DESERVE project a driver model structure was developed with the 
focus on the realistic generation of driver-vehicle-environment interactions. 
For the usage in traffic simulations the driver model has been implemented in 
M atlab/Simulink and exemplarily been integrated in PEL OPS. The addressed 
traffic area covered the inter-urban road network including generic inter- 
sections. Therefore, common driver model approaches but also conceived 
approaches to create the modules needed in DESERVE were used to obtain 
realistic driving behaviour. The elementary interactions between the driver 
models, the associated vehicles and the surrounded environment result in 
realistic traffic phenomena and effects occurring in equivalent real traffic sit- 
uations which was shown by comparing the simulation results with measured 
data on a real intersection. The model behaviour is tuneable via parameters on 
two levels, a humanized and a physical level, which have indirect and direct 
influence on the model behaviour. 
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The structure of the model was designed to offer the possibility of enhanc- 
ing the driver model by using different model approaches or expanding it 
with the capability of performing yet unimplemented manoeuvres and driving 
tasks. In those cases, the challenge is to tune the added model approaches 
while maintaining the realistic influence of the parameters. To simplify and 
partly automate the tuning process a tool can be implemented which uses real 
data to optimize the mathematical influence of the parameters to the model. 
This work will be done in the future to increase the usability of the driver 
model for the simulative analysis of traffic situations. The traffic simulation 
and thus the driver model shall be an inherent part of the tool chain used in 
the development of A DA S and functions of automated driving. 
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4.1 Introduction 


Developing multi-modal applications starting from scratch is a tough issue. 
On the one hand, there are algorithms challenges such as detecting drowsiness 
or pedestrians in every possible situation. On the other hand, there are 
programming challenges such as handling multiple sensors data with dif- 
ferent frequencies and different nature (video streams, GPS data, laser scans, 
etc.), as well as implementation details, such as synchronization techniques, 
multithreading and memory management, for only naming a few. 

Moreover, the time required to develop the software is often underesti- 
mated [1]. Using an already existing middleware helps to keep on schedule 
and focus mainly on business problems while decreasing the real-time 
programming complexity. 

There are several middleware that fit all those previous descriptions 
(ADTF, PolySync, BaseL abs and RTM aps). As RT Maps is the official mid- 
dleware chosen for the DESERVE project and the author is very familiar with 
this one, this chapter will sometimes be focused on RTM aps, but other tools 
might apply as well. 


4.2 Using a Middleware 


Considering software as layered, middleware incorporates many of these 
layers vertically. A middleware provides a full, or partial, solution to an area 
within the application and supplies more than the basic library, it also supplies 
associated tools like logging, debugging and performance measurement, 
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Because middleware is vertical system, it may compete or duplicate other 
parts of the application. 


4.3 The Multisensor Problem 


The number of sensors used for ADAS applications has increased in the last 
few years. Now applications use radars, lidars, GPS, high definition stereo 
cameras, lasers, IM U, CAN Bus, eye trackers, V 2V and V 2l communication, 
etc... The problem is how to read all of them within the same application 
and especially how to synchronize them despite their very different nature 
(Figure 4.1). 

As a matter of fact, most algorithms need to use several sensors to reach 
a good level of detection. The problem is that those sensors might have 
different sampling rates, or even worse, event-based outputs. Reading from 
those sensors simultaneously can be a tricky problem to solve. Let's illustrate 
this with an example with three signals. 

In the Figure 4.2, signal A (orange) and signal B (green) are periodic with 
a different period while signal C (red) is an event-based signal. One solution 
would be to use the least common denominator of all sampling rates to perform 
the reading. While this approach may work with periodic signals like A and 
B, it won't work with the event-based C signal. 

To achieve reading from multi-modal sensors, RTM aps middleware is 
fully asynchronous - each component runs in its own thread - so that any 


Figure41 ADAS function requires many different type of sensor. 
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Figure4.2 Synchronisation issues. 


component can react to any data stream, whatever sampling rate it may have. 
This is the only way to follow the natural pace of each data. This design uses 
internally blocking calls, removing any extra latency that could happen when 
using polling methods. RTM aps middleware also defines reading policies to 
synchronize data streams. W hilethe default policy - reactive - works perfectly 
fine in most case, the user can use one of those: 


e Reactive reading: a component with multiple inputs will read every time 
a new data sample is made available on any one of its inputs. 

e Synchronized reading: a component with multiple inputs will process 
one sample from each input when data sample with the same timestamps 
(plus or minus some configurable tolerance) are available on its inputs. 
This behaviour is made for data fusion and allows re-synchronization of 
the data streams at any point downstream in the diagram, whatever the 
latency of the various upstream data channels. 

e Triggered reading: a component with multiple inputs will read when a 
new data sample is made available on a given input. It will then resample 
the data on its other inputs through non-blocking reading. 


To sum-up, not only the middleware provides a common platform to build the 
ADA S application, but it also does take care of the tricky data synchronisation 
mechanism. 


4.3.1 Knowing the Date and Time of Your Data 


Using a middleware allows to be very accurate about the timing of your data. 
For example, RTM aps affects two timestamps to the data: the timestamp and 
the time of issue. 


e Thetimestamp is the intrinsic date of the sample. It is as close as possible 
to the date of occurrence of the real data which the sample corresponds 
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to. Itis often supplied by the first component that created the sample (i.e. 
the acquisition component). The timestamp remains unmodified while the 
sample goes through the different components of the processing chain. 
The timestamp often corresponds to the date where the data is available 
in system memory. 

e The time of issue is the date corresponding to the last time the sample 
was output from a component. Therefore, this date increases as long as 
the sample runs through the different processing components. 

K nowing with precision the time and date of your data is essential to perform 
synchronized readings (see previous section), but it is also useful to estimate 
the latency of your data or know the processing time of a component which 
is really vital in real-time applications. 


4.3.2 Component-based GUI 

RTM aps middleware comes with a user-friendly graphical interface which 
allows building an application using components (seen as blocks) connected 
to each other. The Figure 4.3 shows RTM aps studio with a diagram open and 
a few components in it. 
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Figure4.3 TheRTM aps Studio. 
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The advantage of using a graphical user interface is twofold. Firstly, 
it allows the user to quickly construct an application by using drag and 
drop techniques and wiring components to each other. Realizing a simple 
demonstration with a camera and an IMU only takes a few minutes [2] 
whereas using only hand-written code with dedicated libraries would take 
weeks. 

Secondly, it allows the team to focus on interfaces. This is a very important 
point since it defines boundaries and clarifies the work between teams. In 
big projects like the DESERVE project, strict definitions about interfaces are 
necessary due to the number of partners. The interface for components is 
composed of inputs, outputs and properties. Once the interface of a task is 
defined, changing an algorithm for another is not a problem anymore, one 
component can be replaced by another and the work is done! In the Figure 4.4, 
the face detection component has one fixed detection interface. The input is 
Y UV image and the output is a vector of rectangle representing the faces 
found. 

Furthermore, the use of macro-components can definitely simplify the 
diagram by splitting the global problem into sub-problems (Figure 4.4). All 
the implementation is hidden in first appearance to simplify the reading, but 
of course looking under the mask would reveal all the internal details. 


4.3.3 The Off-the-Shelf Component Library 


The off-the-self component library represents all the already available com- 
ponents in the middleware. This is an important part of it because it 
allows accelerating the application development by using and reusing already 
developed component. Here are a few categories of components: 


e Sensor interface: This category represents all the components that allow 
to read/write from/to a sensor. W hen a sensor is present in the library, the 
user has just to drop a corresponding component on the current diagram 
and configure it to retrieve the data. That work can be done easily witha 
consequent time benefit. 
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Figure4.4 Components and interfaces. 
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e Data generators: When comes the time of testing a component, it might 
be useful to emulate a missing sensor with random generated data 
(vectors, CAN frames, images). This does not replace real sensors but it 
can be enough sometimes. 

e Viewers: Very important libraries, which allow displaying informa- 
tion about data stream during the execution (images, vectors, CAN 
frames...). AS an example, the DataViewer (Figure 4.5) can display 
generic information (timestamps, size, etc.) and specific ones (width and 
height of an image if current data is an image) as atree. Thisis very useful 
to inspect data along a processing chain and check that such component 
behaves correctly. 

e Player and Recorder: Those components allow to record and replay any 
data stream. Using a recorder, the user is able to record any scenario 
(outdoor session, motorway driving test, automatic car parking, etc.) and 
replay it at the office with the exact same data and timestamps. 
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Figure4.5 Inspecting data with the data viewer. 
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4.3.4 Custom Extensions 


Extending the componentlibrary is done through the SDK, whose purposeis to 
expand the capabilities of the middleware by the creation of new components. 
In RTMaps for example, the SDK is available for both C++ and Python 
(Figure 4.6). Thanks to this SDK, the user can integrate his own code into 
a component and use it directly in this diagram. 

Once a new component has been created, ¡t can be shared with others. 
W hen using C++, each component is compiled code which means that only 
the binary code is used in the middleware and so the IP is preserved. A nybody 
can share his work while keeping the source secret. 


4.3.5 About Performance 


Using a high performance middleware is still essential nowadays. Indeed, 
even if the power of the computer tends to increase continuously, the trend is 
to run applications on embedded systems with the smallest footprint possible. 
The explanation of this trend is quite simple: the prototype vehicle has to be as 
close as possible as the real vehicle. In many companies, no desktop computer 


Figure4.6 Developing a new component. 
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in the trunk of the car are allowed anymore, all systems have to be (or at least 
look) embedded. 

Furthermore, the middleware is pushed further and further in the devel- 
opment chain. A few years ago, most of the middleware were assigned to do 
only prototyping and oncethe prototype application was finished, all the work 
had to be done again on dedicated hardware. T his not the case anymore, now 
the middleware should be able to run on low consumption cards that equip 
pre-series cars. 

Consequently, OEM s are looking for high performance middleware that 
runs on small form factor cards as well as on Personal Computer so that 
working on lab or real scenarios makes no difference. 


4.4 Compatibility with Other Tools 
4.4.1 dSPACE Prototyping Systems 


IntheframeoftheDE SERV E project, a bridge has been developed between the 
dSPA CE M icroA utoB ox and RTM aps (Figure 4.7). ThedSPA CE M icroA uto- 
box isthe de facto standard for real-time control loop such as chassis control, 
body control and powertrain. Combining this dSPA CE prototyping system to 
the RTM aps middleware provides an extremely powerful framework capable 
of doing multisensor acquisition, data processing and controlling actuators in 
a hard real-time way. 

The M icroA utoB ox typically serves as an embedded controller to process 
theA DAS application algorithms in real-time and to interface the vehicle bus, 
sensors and actuators. It is a prototyping ECU with a predefined set of 1/0 
which is qualified for in-vehicle use. 

In the context of the DESERVE project this platform was extended by an 
Embedded PC and an FPGA Board. The embedded PC features a multi-core 
Intel® Core™ ¡7 processor running at 2.5/3.2 GHz and the connection to the 
actual embedded controller is implemented via an internal Gigabit Ethernet 


Figure4.7 dSPACE M icroA utobox and RTM aps Bridge. 
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interface. The embedded PC integrated in the M icroA utoB ox can be used to 
flexibly run any x86 based development framework available for prototyping 
perception and fusion algorithms, such as RTM aps, and to exchange easily 
data with the embedded controller [3]. 


4.4.2 Simulators 


ADAS are becoming more and more promoted because several key functions 
permitto increase the level of vehicle safety. M ost of the time, itisa challenge 
to access to the equipment and sensors information on vehicles, making 
difficult to design and test these new algorithms. Some of the applications 
are based on perception sensors embarked on the vehicle, which interact with 
the vehicle, driver and environment through electronic control units. For those 
reasons, the simulations of the algorithms and the analysis of existing solutions 
for virtual testing are very important tasks. 

Using simulators has many advantages: tune the scenario at will (add rain 
or fog like in Figure 4.8), test dangerous situations where real data is hard to 
get, use the output of any algorithm to modify the scenario of the simulator 
(close the loop), etc. It’s pretty much a fact now; virtual testing allows massive 
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Figure4,8 ProSivic working together with RTM aps. 
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reduction cost. In the DESERVE project, many simulators have been used in 
collaboration with RTM aps: ProSivic [4], ISPACE ASM [5], etc. 


4.4.3 Other Standards 


M iddleware supports other standards as well. RTM aps implements the DDS 
[6] standard interface via the Prismtech OpenSpliceDDS implementation. 
This is very convenient to stream data from RTM aps to anywhere and vice- 
versa. This DDS interface was developed in the frame of the DESERVE 
project. 

Other standard protocols are also supported, like XIL or X CP, which allow 
manipulating RTM aps with off-the-self tools that implements those protocols 
themselves. 

Of course, most of the middleware on the market will also supportNMEA, 
CAN/DBC, RTSP, 12C, GPS, SIP, TCP and UDP as well. The compatibility 
with major industry standards is essential so that the middleware interacts 
painlessly with other tools. 


4.5 Conclusion 


Most DESERVE partners have been using RTM aps and ADTF middleware 
as the common perception platform to speed up their development processes 
and exchange components between each other. 

Indeed, partners like Continental, FICOSA, Vislab and CTAG have 
encapsulated their acquisition routines and custom algorithms into RTM aps 
components, which in turn have been integrated into a global acquisition and 
processing diagram by other partners (OEM s most of the time). This modular 
approach made the collaboration easier between a large number of partners, 
which was one of the difficulties of the DESERVE project. 

Another example, CRF (Centro Ricerche F iat) has used RTM aps and the 
bridge to the MicroAutoBox - developed in the frame of the DESERVE 
project - for their emergency breaking application. The sensor acquisition, 
the pedestrian detection, information display and the breaking order are done 
via RTM aps. 

As a conclusion, in the DESERVE project, having a middleware has 
allowed engineersto focus on their main activity - obviously ADA S functions 
here - and not on advanced programming issues, but it was also very helpful 
to exchange components between partners. 
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5.1 Introduction 


AnADAS function developed withinthe DESERV E platform and thetuning of 
this function for a particular application is discussed in this chapter. B ased on 
separating the software and tuning data, according to the standards described 
in detail in Chapter 2, such a function can also be used for an alternate vehicle 
or application use case. The opportunities as well as the potential challenges 
are described, using a real world example, developed within the DESERVE 
Project. 


5.1.1 ParameterTuning: An Overview 


Tuning or calibration of vehicle components is essentially determining the 
optimum attributes, which fulfill the legislative standards as well as refine the 
car’s character to meet all the expectations of the driver for drivability and 
comfort. Besides the comfort and legislative issues the vehicle tuning also 
helps in brand differentiation and helps to determine the vehicle character. 

In the tuning task for a specific component (e.9.: engine), the software and 
the tuning data in the application layer of an Electronic Control Unit (ECU) 
is separated which is illustrated in Figure 5.1. The resulting code is a hex file, 
which can be flashed to the defined controller hardware which gives a big 
flexibility in powertrain development. As an example, one engine hardware 
can be put into more than 200 vehicle variants fitting for different countries, 
different vehicles and/or different transmission systems - just by flashing a 
different appropriate controller software. 
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Both are compiled into 
a hex file. 


This hex file is used 
in, and can be 
modified by 
Application Systems 
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CRETA...) 


Figure 5.1 Separation of software and tuning parameters in a control unit. 


5.1.2 Industrial Tuning Applications: Challenges 
and Opportunities 


Theengine- ECU has been the first mechatronic application in the automotive 
world. It makes sense to have a short view on the historical development of 
the tuning task in this field as illustrated in Figure 5.2. 

In the past decades, the improving technology in the automotive sector 
can be seen with cars having better engine performance, less consumption, 
better handling and reduced emissions. But the improvement in technology 
has come with increased complexity, especially in the tuning task. 
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Figure5.2 History of powertrain tuning (calibration). 
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As can be seen in Figure 5.2, initially there used to be around 500 
parameters which needed to be tuned, which was carried out by a single 
engineer using the unit to be tested, which was then tested on a single test 
vehicle. Initially, the powertrain was quite simple and the Engine - ECU was 
the only one being considered. 

With increasing legislative and user demands; the complexity of the tech- 
nology, the number of involved interacting components (engine, gearbox and 
electric engine) and also the number of functions controlling the interactions 
between all the variable components increased dramatically. Further the tuning 
allowed the derivation of many more vehicle variants with the same hardware 
components but differing in the ECU-SW, wherein the functions in the SW 
stay the same, just the tuning data are specifically developed. 

This effect is also seen in the number of tuning parameters to be defined 
in an engine calibration project, where around 50 k parameters have to be 
defined - clearly assigned to many functions. So it is no longer possible to 
have one person, who understands all the functions implemented and teams 
of specialized persons are necessary, partly working in different areas of the 
world. Thus the industry was confronted with several challenges and found 
some responses. 

For example, the management of tuning data becomes an issue. It must 
be possible to track all the changes made to the tuning data by the different 
engineers involved and bring all the tuning results into a single final tuning 
result. The company should be able to ensure at Start of Production (SoP) 
that: 


1. All the tuning data are calibrated. 
2. All the tuning data are calibrated with the correct settings to optimally 
fulfill the desired, derivative use case. 


These two requirements are very challenging, which explains the need of 
“Tuning Data M anagement”. This topic itself is not further elaborated in this 
chapter, but is supported by valuable literature [1, 2]. 

Another challenge lies in the tuning for single use cases: For example, 
the emission tuning of an engine in a certain vehicle configuration for the 
legislation of a specific country. There are about 5 to 10 strongly interacting 
tuning parameters. E.g. an engine map to define the start of the combustion as 
function of speed and load is counted as one of these parameters, and exhaust 
gas recirculation rate, rail pressure, boost pressure, split patterns of the injected 
fuel quantity are others, all either reducing the different kinds of emissions or 
changing fuel consumption or noise. 
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So one can imagine, that it is just not possible to measure the emissions 
and the fuel consumption of all the feasible combinations of say 8 of such 
parameters on an engine. (A similar issue faced with ADAS functionality) 

Such tasks are typically performed on engine test beds and chassis dynos 
and have to be finally validated on the road again. With the latest legislation 
(Real Driving Emissions, RDE) even the certification will be done on the road 
giving additional challenge [3-5]. 

Figure 5.3 illustrates the generalized development environment, which 
allows the engineer to reproduce maneuvers and then double check the results 
of tuning work. In the manual tuning method, the engineer operates the UUT 
with a certain setting of control parameters in certain maneuvers. T he engineer 
observes the behavior of the UUT and performs a judgment according to 
his experience. Then the next setting is defined with the intention to better 
approach the desired behavior. This process becomes complex when there are 
many relevant tuning parameters [6]. 

In this trial and error method, the quality of tuning and the optimization 
results depend on whether the engineer considers all the parameters that are 
relevant for the desired behavior and the relevant start point. There is a strong 
dependence on the experience of the engineer. There are also limitation on the 
number of tests that can be conducted, due to the testing time, complexity and 
cost factors. The final results are highly subjective, as the decision making 
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Figure 5.3 Illustration of a generalized development environment and manual tuning 
process. 


5.1 Introduction 81 


process lacks traceability and a reuse is not possible for future projects, e.g. 
tuning an A DAS setup for a different drive mode. As a result, a methodology 
to increase the efficiency and the quality of the tuning work at the same time, 
the so called "Design of Experiment" method (DoE) was adapted accordingly. 

Within the DESERVE context this methodology was applied as “Design 
Space Exploration” for Simulation environments, which are excellent devel- 
opment environments for tuning of A DAS Functions. 

The model-based approach was used with two objectives: 


e Firstly, to find an optimum tuning result. 
e Secondly, to validate an existing tuning result under a big variety of use 
cases, which will happen during the lifetime of a vehicle. 


5.1.3 Model-based Tuning 


M odel-based tuning is a statistical, model-based approach which reduces 
the amount of actual experiments/test runs needed to accurately describe the 
behavior of the UUT within the design space. This method helps to choose the 
position of the test data points in order to generate behavior models with 
an efficient low number of measurements. Such models are then utilized 
to develop an accurate and robust tuning according to specific optimization 
target(s). In Figure 5.4 the entire method is illustrated again for the generalized 
development environment, 


— decision 
: Calibration 
oo 
Task Planning ~——e variants 
Targets, constraints 
Optimization 


Variants 


Target & 


tests, parameter space... 
Verification | onbirainte 


Control Unit m 
= —— () 
Variation of zs 
e operation = 9 Behavior 3 
ZE © 2 emission, 
ae Parameter 2 paul Test) 9 a sepa 
Parameter ... 3 2 GONNA, TIO Ct 
Test plan generation and Behavior model 
intelligent test execution using A MÓN 
Fi = ertormance, 
Design of Experiment - DOE Driveability 


Figure 5.4 Model-based tuning task illustrated. 


82 


Tuning of ADAS Functions Using Design Space Exploration 


In a model-based tuning task the below steps are followed: 


The user begins with a task planning for the measurement series, where 
the targets for the tuning task are determined. B ased on the targets, the 
relevant input parameters which are considered to influence the observed 
UUT response are selected. AVL CAMEO is used for the test plan 
generation. This is based on a one time set up process, in which CAM EO 
is connected to the development environment. Thus CA M EO gets access 
to set tuning parameters in the UUT, observe responses of the UUT and 
to start/stop maneuvers and to take measurements after maneuver. The 
development environment hosting the U UT could be in the form of a test 
bed, a hardware-in-the-loop (HiL) or even a vehicle simulation software 
like IPG Carmaker in combination with an ADAS-function prototype 
programmed in MATLAB. 

Once the targets have been defined the next important step is to make 
the test matrix. In order to get a full picture of the area to be investi- 
gated, the Design of Experiments (DoE) is used [7]. It is a systematic 
technique which allows varying all the parameters simultaneously while 
answering the two important questions of every tuning activity: Firstly, 
how many tests are needed to cover the entire design space? And 
secondly, at which locations in the design space test points are needed 
to effectively get modelling equations valid throughout the entire design 
space. There are many DoE designs available to us in AVL CAMEO, 
but COR DoE methodology [8] was used in the current example exer- 
cise. Besides setting up the test design, it is also important to set the 
limits for the test and appropriate actions when the limit is violated. 
These topics are addressed further on in the example discussed in 
Subsection 6.2.1. 

With the test plan and limits decided the tests are run, where the necessary 
parameter settings are uploaded to the UUT by CAMEO, and after the 
test, the required measurement results were stored in CAM EO. The raw 
measured data check is then carried out in order to check the plausibility 
and feasibility of measurement. It is a necessary check to get a rough 
idea of how the measurements compare against expected values, and 
also observe possible errors which could have occurred during the test 
execution. 

The measurements are modeled empirically to obtain behavior models of 
the UUT. In this content, modeling means more or less to fit a function - 
like a polynomial equation for example - into the measured responses in 
order to estimate the response function of any point in the design space. 
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Such a model helps understand the reaction of the UUT to the parameter 
tuning, and the interaction of the different tuning input parameters and 
the output measurements. T he confidence and prediction intervals of the 
empirical models are observed to evaluate the model quality. M odels in 
CAMEO also allow extrapolation in defined ranges beyond the design 
space covered by measurements to observe the UUT behavior at points 
where tests could not be run based on equipment limitations or time/cost 
constraints. 

Based on the optimization target, optimization algorithms can be imple- 
mented for a single objective or multiple objectives. The engineer can 
decide if the results meetthetargets and constraints and in case of multiple 
objectives decide on a suitable tradeoff between the different desired 
targets (Pareto front). 

Before, the results from the analysis are accepted afinal verification testis 
carried out. Tests are run at least on the point of the decided optimum, but 
can also be extended on parameters settings of ten or more points spread 
across the Pareto front. If these verification measurements match the 
modeled results then the empirical models are accepted and the engineer 
can use the optimization results as the desired tuning setting. 


5.1.4 Model-based Validation 


A model-based validation is a task carried out to test and evaluate the 
robustness of the results from the tuning task. The UUT isrun atthe parameters 
settings obtained from the tuning task, but tested for an alternate use case 
and the response is evaluated. For example; if say a diesel engine was tuned 
to operate at an economy mode and a sport mode with strong limits set on 
NOx emissions. Economy mode encourages the engine to conserve fuel while 
sacrificing power, while the Sport mode encourages the engine to provide 
greater power while making compromises on fuel economy, with the engine 
running more at the higher RPMs. The engine is initially tuned at driving 
conditions imitating an urban environment and lower altitudes, and from the 
tuning tasks the input parameters settings like the rail pressure, injection 
pressure, injection timing etc. are selected to operate the engine at the two 
targeted modes while sticking to the NOx limits. In the validation test run the 
engine is first run at the economic mode and then sport mode, but now the use 
case is in hilly road conditions and higher altitude. The engine performance is 
evaluated with respect to power and emissions, while the road and altitude of 
operation is varied. The target is to see if tuning settings could be extrapolated 
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or extended to alternate use cases. It also gives further information on how the 
engine tuned for urban conditions would perform on rugged hilly conditions. 


5.2 Demonstrative Example 


A map-based ACC-Function (developed by the DESERVE Partner CRF) 
running in a commercially available MiL Environment (IPG-Carmaker + 
MATLAB Simulink) has been used as an example. The calibration tool of 
AVL CAM EO was connected to this environment in order to tune the function 
for a Fiat 500L. 


5.2.1 Function: An Overview 
A map-adaptive autonomous cruise control (ACC) was developed to: 


e Control the vehicle velocity in order to enter and exit curves in a 
comfortable and safe manner. 
e Complete the drive maneuver in the least amount of time. 


The controller function controls the vehicle speed by sending jerk request 
(see Figure 5.7). Jerk is the rate of change of acceleration. Hence the jerk 
request signals from the controller function are converted into the vehicle 
acceleration and speed. F or the reference maneuver a digitized road was used 
and a reference speed curve was determined, which is the maximum speed 
at which this road can be safely maneuvered. The function tries to ensure 
that, the vehicle follows this reference speed profile as closely as possible 
without exceeding it. The target speed was set at 130 km/h for the ACC. 
A demonstrative speed profile is shown in Figure 5.5 for a sample settings in 
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Figure5.5 Velocity profiles for a sample test run using the control function. 
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Figure 5.6 Function developed using IPG carmaker and MATLAB simulink. 


the ACC function. It can be seen that the vehicle velocity tries to follow the 
reference velocity while never exceeding it. The vehicle velocity is not able 
to exactly replicate the reference velocity due the road conditions, the vehicle 
limitations and the control function settings. 

The function was developed using IPG Carmaker for Simulink and has 
been illustrated in Figure 5.6. IPG Carmaker for Simulink is integrated into 
MATLAB/Simulink and necessary modification were done by adding the 
custom Simulink blocks developed for the current use case. 


5.2.2 Design Variables 


In order to tune the function for the reference maneuver, four input parameters 
or design variables were selected (see Figure 5.7).A s per the terminology used 
in CAM EO these tunable input parameters will be referred to as the variation 
parameters. The variation parameters selected for the tuning task are: 


e Acceleration M aximum (A MAX) limits the maximum positive accele- 
ration the vehicle can have while safely completing the maneuver. The 
negative acceleration is not limited in order for the vehicle to generate 
the necessary breaking force in case of obstacles. 

e Jerk Maximum (J MAX) limits the maximum positive jerk request from 
the controller function in order to meet the reference velocity curve. But 
only the positive jerk given by the engine and responsible for positive 
acceleration is limited, while there is no lower limitfor the negative jerks 
for reasons mentioned previously. 
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Figure5.7 Function overview. 
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Figure 5.8 Illustration of the kinematic variables A MAX and) MAX. 
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Figure 5.8 illustrates the kinematic parameters, with acceleration being 
the derivative of velocity and jerk the derivative of acceleration. 

e Forward Time (FORWARD TIME) is a gain factor to transform the 
jerk request from the controller function to an acceleration request. Even 
though the controller function is based on jerk and sends the desired 
jerk requests for the vehicle, the interface to control vehicle motion is 
based on acceleration. Hence to control the vehicle the desired value 
of acceleration is required. In order to obtain the desired accelera- 
tion from the request jerk, one has to look forward for a given time 
which is called Forward Time. M athematically it can be defined by the 
formula. 


A req=A 0 +] reg*FORWARD TIME 

A req is the A cceleration request 

A 0 is the current vehicle acceleration 

J req is the J erk request generated by the controller function 


Jerk Horizon (J HOR) is a parameter used to determine when the 
controller function sends the necessary jerk requests and the required 
jerk magnitude in response to an approaching curve. To define what 
is "near" and "far" (with respect to the distance from the approaching 
curve) for the controller function, the parameter J HOR is used, where 
HOR stands for the horizon points (of the electronic horizon) to be 
considered. J HOR is always a negative value, and values closer to zero 
make the controller respond to the approaching curve when it is further 
away with a smaller deceleration demand. Higher negative value tells the 
controller to respond when the approaching curve is closer in proximity 
but with a larger deceleration. A pictorial representation is given in 
Figure 5.9. 

The black line represents the target velocity set for the controller and 
the reference velocity curve is given in red. A s explained previously the 
controller tries to control the vehicle speed (in blue) as close as possible 
to the reference speed. 

The mathematical expression "A MAX +J- HOR *time" determines 
the funnel of the vehicle velocity curve shape (shown in blue). More 
negative J HOR give the velocity curve a sharper shape, while values 
closer to zero givethe velocity curve a flatter shape. 


The range of the variation parameters examined in the tuning task have been 
shown in Table 5.1. 
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Figure5.9 Illustration of the design variable (variation) JHOR. 


Table5.1 Range of variation parameters used in the tuning task 


Design Variable From To 
A. MAX (m/s"2) 1 5 
FORWARD TIME (s) 0.1 2 
J HOR (m/s^3) -5 -0.2 
J MAX (m/s'3) 1 3 


5.2.3 Key Performance Indicators (KPI) 


The output variables to demonstrate the effectiveness of our tuning task to 
meet the targets are described below and illustrated in Figure 5.10: 


e Mean Speed: The mean of the vehicle speed in each test run is indicative 
of the sportiness of the driving experience. A higher mean speed helps 
finish the test maneuver in less amount of time, and makes the driving 
experience sportier. 

e Speed below reference: The reference speed curve is the maximum speed 
with which the vehicle (Fiat 500L) can maneuver the digital test track 
without leaving the road for the reference use case. Hence to ensure 
vehicle safety it was ensured that the vehicle speed during the tuning 
task was always below the reference velocity. 
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Figure5.10 Key performance indicators. 


e Jerk RMS: Vehicle jerk which is the rate of change of vehicle acceler- 
ation, is indicative of the driving comfort. Lower rate of change of jerk 
gives a comfortable ride, so the root mean square of the jerk in atest run 
is a good indication of the driving comfort. 


5.2.4 Test Maneuver 


The test maneuver consisted of 5000 m test run on a digitized road imitating 
the road between Ceva and Savona in Italy run on IPG Carmaker for Simulink 
(CM 4SL). IPG Carmaker environment is illustrated in Figure 5.11. The top 
leftisthe Carmakerfor Simulink main GUI, showing details aboutthe vehicle, 
simulation speed, time and distance of maneuver etc. T he bottom left imitates 
the car instrumentation. T he top right is time based plot of car speed and the 
vehicle jerk. The bottom right is the IPG M ovie which illustrates the overall 
test run in a movie. 


5.2.5 Test Run Overview 


The test run overview is illustrated in Figure 5.12. The test parametrization 
was done in AVL CAMEO, where a space filling DoE design with the four 
variations was used. The variations were then uploaded to CM 4SL through 
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Figure 5.12 Test run overview illustrating the work flow. 


the CAM EO-Carmaker Interface, where the test maneuver was run for each 
variations setting. AVL CAMEO then stores the measurement parameters 
observed as the K Pls for further evaluation. 
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During parametrization there were limits set on the minimum (-2 m/s”3) 
and maximum (2 m/s"3) acceptable vehicle jerk values. W henever the vehicle 
jerk value violated the limits the test run at that test point was halted 
and no measurements were recorded. This affected the overall DoE design 
effectiveness with a reduced design space and as a result reduced measurement 
points. To overcome this challenge a COR DoE (Customized Output Range) 
method was utilized, which is an iterative method where first alternate test 
points were added by CAMEO to maintain the DoE design. Then based on 
these preliminary measurements the design space was further modified and 
additional test points were added in the relevant variation space to improve 
the final information from the measurements. D esign space modification. The 
AVL CAMEO interface is illustrated in Figure 5.13, where the image to the 
left illustrates the overall test parametrization while the image to the right 
shows the test run window. 


5.2.6 Raw Data Plausibility Check 


Before the mathematical modeling of the selected output measured variables, 
the raw measurements were checked for plausibility. Firstly, the measured 
variables were checked for any outliers as shown in Figure 5.14 for mean 
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Figure5.13 Leftimageillustratesthetest preparation window whiletherightimageillustrates 
the test run window. 
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Figure5.14 Checking for outliers in the measured variables. 


speed. T he measured values were within the acceptable range. T he figure also 
shows thatthe repetition points (a select number of test conditions, usually the 
start condition which are repeated to check the reproducibility of test results) 
shown in green were perfectly reproduced. 

The effect of design space modification, due to limit violations and the 
design correction by COR DoE method can be seen in Figure 5.15. In a 
certain range of variations for A MAX,J.HOR and FORWARD_TIME there 
are no test points. Limit violations encountered when tests were carried out at 
these range of points are the reason why they were skipped by AVL CAMEO. 
Conversely a greater density of test points in certain ranges of variations show 
where the COR DoE added alternate or additional test points. 


5.2.7 Meta Modelling 


The raw data plausibility check was followed by empirical modeling of the 
output variables. T he automatic modeling in CA M EO gave reasonable results 
with a neural networks model with local model order 2, as can be seen in 
Figure 5.16 which is the M easured (Predicted) plot which shows the fit of the 
model to the measurement points. If there is a perfect match all points will lie 
along the black line, but in our case the measurement points are reasonably 
close to the black line. 
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Figure 5.15 Check of DoE design and the boundaries of variation parameters. 
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Figure5.16 Figure depicting the quality of empirical modeling. 


After checking the quality of modeling, the intersection plots were used 
which represent a cut through the multidimensional model, showing the 
influence of each variation depending on the values of the other variations. 
In Figure 5.17 the influence of the variation parameters on Speed. M ean and 


94 Tuning of ADAS Functions Using Design Space Exploration 


122, 3.4964 3.0386 
| 
120 
€ | 
€ |! 
x i 
= 118) 
z b 
XL ' 
ae 
zi 116) 
= | 
x | 
Q 114 
w i 
o ' 
NE eui] 
1124 
' 
| 
0.35 
0.3 
' 
i 
0.2 
$ 02 
a i 
E 0.15 
W ' 
ww | 
0.4 
0.05 
| 

[n | 

4 2.9 4 5 05 1 15 X RE 051 2 3 4 
A. MAX [-] FORWARD . J_HOR [-] J_MAX El 
TIME [-] 


Figure 5.17 Intersection plot highlighting the influence of each variation on the output 
variables and their interaction. 


Jerk_RMS can be observed. The confidence interval of the model is displayed 
in the green dotted line and colored section. The narrow confidence interval 
shows a high quality fit. The green bar on the x axis for each variation shows 
the total design space, and as the confidence interval of the model in the 
extrapolated region is also narrow, it shows good extrapolation capability of 
the model. Now looking at the intersection plots, it can be noticed that) HOR 
and A-MAX have a strong influence on the output parameters. The more 
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negative the J HOR, the later the vehicle reacts to an approaching curve. 
Hence it is still travelling at a high speed before decelerating to approach 
the curve safely. Hence a higher mean speed is observed, but the resulting 
braking produces higher vehicle jerk reducing the driving comfort. Influence 
of A MAX can be a bit counter intuitive but it can be seen that A MAX is 
used to calculate J| HOR. The higher A MA X, the less negative is JHOR. 
Hence for higher A MAX values J HOR is closer to zero hence a smoother 
and slower ride. It can also be observed that higher FORWARD TIME allows 
for a smoother and slower ride, which is becausethe controller can take more 
time to achieve the desired acceleration. 


5.2.8 Optimization 


From the intersection plot, it is possible to manually find values of the 
variations which give a comfortable ride or sporty ride or an acceptable com- 
promise. B utitis quite easy to miss the optimum or an acceptable compromise 
when working with multiple input variations, hence the optimization tool in 
CAM EO was used. In the current tuning scenario, the target was to be able 
to isolate two modes of operation, comfort mode and sporty mode. Hence 
a multi objective optimization was chosen with limits set on the minimum 
desired mean speed of 115 Km/h and maximum acceptable JERK RMS of 
0.28 (Figure 5.18). 
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Figure5.18 Optimization setting window in AVL CAMEO. 
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Figure 5.19 Trade-off plot between comfort and speed. 


The result is plotted in a trade-off plot as shown in Figure 5.19, where 
the steel blue ¡s the pareto front, the blue points indicates the measurement 
values and the other yellows points are random space filling points. The pareto 
front shows the possible optimum trade-off solutions which can be considered 
equally good as the only way to improve on objective would beto compromise 
on the second objective. So by observing the pareto front itis possible to define 
an optimum for comfort mode and an optimum for sporty mode of operation 
Table 5.2. 

In Figure 5.20: Sporty mode vs comfort mode: the vehicle performance 
when operating at the two modes can be observed. The red velocity curve is 
the reference velocity and blue velocity curve is the actual vehicle velocity. 
It can be observed that the actual velocity is always below reference velocity 
which was the safety requirement. A Iso the velocity changes in comfort mode 


Table5.2 Variations values for comfort and sporty mode 
AMAX FORWARD TIME JHOR J MAX SPEED Mean JERK RMS 
Comfort 4.99 1.94 - 0.84 1.0 115 0.09 
Sporty 3.88 1.37 -184 3.36 120 0.28 
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Figure 5.20 Sporty mode vs comfort mode. 


is more gradual with no sharp peaks unlike in sporty mode where there are 
rapid fluctuations in vehicle velocity. This behavior is also mirrored in the 
acceleration values in both operation modes. The vehicle jerk curves (red plot 
is the jerk request generated by the controller and blue the actual vehicle jerk 
response) show much lower values in vehicle jerk for comfort mode while the 
sporty mode show sharp and frequent peaks in jerk value. 


5.2.9 Verification 


The pareto front consists of points a majority of which are from the model 
extrapolation. In order to verify the robustness of the model to accurately 
extrapolate, ten random points were selected from the pareto front and for the 
corresponding variation values the test runs were rerun. The results from these 
test runs were evaluated as verification points in CAMEO. The Figure 5.21 
shows the extrapolated model (in red) and its prediction interval (in blue), and 
the measured verification points and its modeling (in green). The measured 
verification points lie within the prediction interval of the model, showing the 
extrapolation accuracy of the model. 
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Figure 5.21 Verification plot to see how well the measured results from the verification run 
fit the model results, 


5.3 Model-based Validation 


Once the reference tuning task is completed, it has to be tested, if the tuning 
results are still acceptable, when not running the reference use case but 
for varying road characteristics. Will the comfort mode still allow for a 
comfortable drive also for different road situations? It would be unfeasible 
to run simulations on thousands of different roads, besides making it difficult 
to realize the influence of a specific road. In the current method the two tuning 
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modes are fixed and a system variation of a digitized road is performed using 
the model based approach to validate our tuning results, 

The digitized road is shown in Figure 5.22, where the lengths of the straight 
sections (L1, L2, L3, and L4) and curvatures (R1, R2, R3) were varied while 
keeping the total maneuver length to 5000 m. The controller settings were 
fixed to run at first comfort mode and then sporty mode, and the resulting 
measurement output variables are shown in Figure 5.23. 
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Figure5.22 Digitized road used for the validation run. 
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Figure 5.23 Measurements comparison when run on comfort mode (in blue) and sporty 
mode (in red). 


It can be seen in Figure 5.23: M easurements comparison when run on 
comfort mode (in blue) and sporty M ode (in red) that for the sporty mode the 
resulting drive comfort is lower as indicated by the higher JERK RM S. The 
length of the straight portions do not influence the JERK RM S for comfort 
mode as strongly as in the sporty mode. The curvature of the turns seem to 
influence the output in both the operation modes. A JERK_RMS limit of at 
least 0.35 is expected, and it can be seen that the limit is maintained in both 
the modes of operation for majority of the design space. In the sporty mode 
the controller is set to maintain a higher vehicle speed and responds to the 
oncoming curveonly when itis close, hence the longer the straight sections, the 
larger the jerk experienced when it decelerates rapidly to approach the curve 
followed by a strong acceleration on leaving the curve. For the comfort mode, 
the controller is set to focus on keeping the vehicle jerk close to minimum. 
The validation task showed that, if the function (our U UT) is kept constant and 
the simulation environment is changed, the function still manages to meet the 
expected vehicle jerk targets. The influence of ‘L 4’ on the jerk behavior needs 
to be further investigated as it strongly increases the vehicle jerk fluctuations at 
higher values especially for the sporty mode. To further explore and investigate 
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the influence of test track characteristics on the function response, it can be 
tested on a variety of road types and test tracks. This assists in the further 
improving the function performance. 


5.4 Conclusions 


Virtual tuning of an ADAS function developed on a MiL environment using 
an optimization tool can be a powerful combination for the development of a 
brands driver assistance system. The classical approach relies on a subjective 
tuning of theA DA S function on a proving ground and public roads, which can 
be supported and accelerated by using a virtual tuning environment, Using 
DoE methods supported by AVL CAMEO, it was possible to increase the 
number of tuning tests compared to a manual tuning, and also the number of 
target parameters and tests needed to match them. The possibility to use the 
developed function for alternate use cases by separating the software and the 
tuning data is precondition for tuning works in general. 

Independent of that also in the validation process a model-based approach 
can be very helpful, as the test coverage for a certain use case can be extended 
to a wide range of possibly occurring variants of that use case. The robustness 
of the key performance indicators considered as relevant can be estimated. 
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6.1 Introduction 


Today, vehicles contain a wide range of electronic driver assistance systems. 
These systems, for example Anti-lock Braking System (ABS) or Electronic 
Stability Control (ESC), increase car safety and on a more general level even 
road safety. More complex Advanced Driver Assistance Systems (ADAS), 
like Lane Departure Warning, Overtaking Assistant, Collision Warning or 
Emergency Breaking do not only observe the parameters of the vehicle itself, 
but also require information regarding the environment. Future applications, 
which target autonomous driving, need an even more detailed understanding 
of the vehicle's environment and the current driving situation. Therefore, 
vehicles are equipped with a number of sensors, which enable the perception 
of the vehicle's surroundings including other road users. But the sensors 
general y used deliver a huge amount of raw and unrefined data, from which the 
necessary information needs to be extracted. For instance, for camera sensors, 
an algorithm called Scene Labeling can be used to detect relevant objects in 
camera images. It assigns every pixel of an input image to a semantic class 
(e.g., road, car, free space etc.) and can therefore be used to extract detailed 
information from the scene. 

The increasing complexity of algorithms and the increasing amount of 
data that has to be processed requires a high amount of processing power. At 
the same time, processing hardware is subject to restrictions regarding power 
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consumption and size. These conditions make the field of embedded hardware 
platforms for driver assistance systems challenging. 

This chapter is organized as follows: Section 6.2 gives an introduction 
to Scene Labeling techniques and their application in Advanced Driver 
Assistance Systems. Section 6.3 explains the concepts of Convolutional N eural 
Networks and Deep Learning. In Section 6.4, an exemplary CNN is presented 
and evaluated. Section 6.5 describes different hardware platforms for Scene 
Labeling. Finally, Section 6.6 summarizes the chapter. 


6.2 Scene Labeling in Advanced Driver 
Assistance Systems 


Getting a thorough understanding of the vehicle’s environment is an important 
step in the development of advanced driver assistance systems. Different 
techniques for detection and classification of objects have been developed. 
Literature offers a wide range of algorithms for detecting traffic signs, 
traffic lights, driving lanes, and also other vehicles and pedestrians. In 
order to build up a comprehensive understanding of the environment, not 
only single objects have to be detected, but also the objects in relation to 
each other have to be determined. This is commonly referred to as Scene 
Labeling. 

Scene Labeling is a technique to classify images on different levels of 
detail. Image-level Scene Labeling (e.g., [1]) is used to derive one or more 
labels for the wholeimage that describe different scene types, e.9., urban, inter- 
urban, or highway. On another level, labels are deduced for small sub regions 
of an image, so called regions of interest. This allows for a more detailed 
understanding of the scene in terms of objects, like pedestrians, vehicles, 
driving lanes, traffic signs and so on. On athird level of detail, each pixel in an 
input image is classified and provided with a semantic label. The information 
provided by these labels can be used in different applications, for example in 
pedestrian/obstacle detection, close range lane course estimation or relative 
map positioning. 

Scene Labeling can also be combined with other detection methods in 
order to increase reliability and thereby increase the integrity level of safety 
functions. M oreover, it can replace different detection modules in order to 
save resources. 

The Scene Labeling task is usually performed in two steps. The first 
step extracts features from the input image; the second step computes a 
classification of the image, the region, or the pixels from the extracted features. 
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Several different features are used in order to perform image segmentation 
and semantic labeling. Some algorithms rely on single, low-level features, 
like color [2], texture [3, 4], shape [3, 5], geometry [6], and edge features 
[7]. Object detection algorithms are used to extract high-level features, e.g., 
pedestrian detection [8], traffic sign detection [9], and lane detection [10]. 
Some algorithms perform labeling using image segmentation techniques, e.g., 
Super Pixels [11] or sliding windows using Boosting [12] to detect regions of 
one certain class, e.g., pedestrians or traffic signs. 

Classification of extracted features is performed using different tech- 
niques, like Support Vector M achines [13], Genetic Algorithms [7], or N eural 
Networks [14]. Probabilistic models like Conditional Random Fields (CRF) 
[15] and graph-based optimization methods (e.g., Graph Cut [16]) are used 
to combine different features and include smoothness constraints or neighbor 
relationships. 

Recent advances in the field of deep learning and neural networks yielded 
anew technique for the scene labeling problem, which is described in the next 
section. 


6.3 Convolutional Neural Networks and Deep Learning 


Typical systems for detection and recognition of objects or situations use a 
two-step data processing scheme. In a first step, features are computed from 
data gathered through different sensors, like cameras, radar, etc. T hen, a second 
step uses the previously computed features in order to classify the candidates 
into the object classes. The implementation of the classification step might 
involvethe use of machine learning techniques, i.e., the training of a classifier. 
One difficulty in this scenario is the selection of features to be used. Often, 
these features are hand-crafted and a lot of work might be involved in tuning 
the parameters in order to find a set of features that can be used for reliable 
detection and recognition of objects. 

A nother way of building recognition systems that evolved recently is the 
use of learning techniques and especially the technique of deep learning with 
close coupling between the feature extraction and feature classification steps. 
Deep learning describes methods, in which feature extractors are not hand- 
crafted but automatically learned from a set of training data. M ultiple layers 
of feature extractors can be used in a hierarchical structure in order to allow 
deeper layers to extract features of higher order from previous layers. The 
idea behind this technique is that the learning algorithm is capable of detec- 
ting the best features for the following classification step itself. Commonly 
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used implementations of the deep learning methodology are artificial neural 
networks. 


6.3.1 Introduction to Neural Networks 


Inspired by processes in the biological neural networks of the central nervous 
systems and especially the brain, different computational models of artificial 
neural networks have been developed [17]. Artificial neural networks are built 
asa collection of relatively simple units, so called neurons, that are connected 
together to form a network which can process a complicated task. One of 
the first models of neural networks is called perceptron [18]. The simple 
perceptron neurons perform binary decisions depending on their input values. 
The input signals x; are weighted and accumulated. The neuron “fires”, i.e., 
produces an output signal y of 1, if the weighted sum of the input signal 
exceeds a given threshold value, and outputs 0 otherwise. The first networks 
had one single layer of neurons and were only capable of computing linear 
classifications. M ore complex networks with multiple layers were capable 
of computing more complex classifications. Nowadays, neural networks use 
a different model for the artificial neurons [19, 20], as depicted in Figure 6.1. 
The input values, which are now real numbered values, are weighted and 
accumulated. Afterwards, a non-linear activation function is applied to the 
sum. Commonly used activation functions are the sigmoid function, which 
can be interpreted as a smoothed threshold. Recently, rectifier linear units 
(ReLU) have been reported to have several advantages over the sigmoid 
functions [21]. Some exemplary activation functions are shown in Figure 6.2. 
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Figure 6.1 Model of an artificial neuron. 
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Figure 6.2 Exemplary activation functions used in neural networks. 


The bias is another value summed up along with the weighted inputs. This 
parameter influences the neuron's general activity or the likelihood for an 
output activation of the neuron. For simplicity, the bias can be interpreted as 
the weightfor a constant input value of 1, so that all parameters of the network 
can be interpreted as weights. Therefore, a neuron with inputs x1, £2.. ., Tn, 
weights t, . . ., Wn, bias wo, (with xy = 1) and activation function f can be 


described mathematically as 
=< f (>: wi) A 
1=0 


In so called Multi Layer Perceptrons (M LP), neurons are arranged in layers. 
The neurons of one layer are connected to neurons in the following layers. 
No connections exist between neurons of one layer and the graph formed by 
the neurons and connections is a directed acyclic graph. Therefore, M LPs are 
called feed forward networks. 

The task performed by the neural network depends on the parameters, 
namely the weights and biases. Therefore, the network parameters have to 
be adjusted before the network produces the correct outputs. This adjustment 
is called training. Different methods for training multi-layer feed-forward 
networks have been devised. The most commonly used technique is the 
backpropagation of error [22]. 


6.3.2 Supervised Learning 


In a neural network, the internal parameters (weights of the neurons) are 
also called trainable parameters, since they can be trained to approximate a 
desired function. In case of Scene L abeling, this function would map a pixel of 
an imageto a specific label, using the pixel's neighborhood. For classification 
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tasks with a given set of classes, supervised learning schemes are used. A set 
of training samples contains input images together with the desired output. In 
combination with an error function, the training set can be used to adjust the 
internal parameters of the network. 


The Cost Function and Backpropagation 


Supervised learning for neural networks is performed by measuring the neural 
net's estimated output against the expected output with a so called cost 
function. The goal of a supervised training is to find the internal parameters 
which minimize this cost function regarding a set of training examples. Since 
the network in general models a highly non-linear function, gradient descent 
can be used as an optimization procedure. This is done by computing the 
gradient of the cost function and leveraging the chain rule to propagate the 
cost and the gradient back through each layer of the network. The weights in 
each layer are updated according to the current gradient of the backpropagated 
cost. This algorithm is therefore called backpropagation. 

A successful training converges against the minimum value of the cost 
function. It is important to choose the cost function suitable for the task that 
the neural network needs to perform. For classification tasks, a combination of 
the softmax function and (multinomial) logistic regression is often performed 
to train the internal parameters. T he softmax function serves as a normalization 
function, which maps input values x; of arbitrary range to values in the range 
(0, 1) that add up to 1. The maximum of the input values maps close to 1 while 
the other values map close to 0. The function is defined by 

evi 
y» ere 
The softmax directly serves as the multinomial version of the logistic function 
used in logistic regression. The resulting cost function is defined by 


softmax(a;) = foro =. ae. 


cost(z) = —In(softmax(z%,)), 


with a. as the predicted output of the neural network for the actual class k. 
The cost is therefore the negative log-likelihood of the expected class, which 
minimizes, when the estimated probability for that class is 1. 


Stochastic Gradient Descent 


Gradient descent is an algorithm that finds a local minimum by following 
iteratively the negative gradient of a function F(x) at each point x. It can be 
defined as 
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dup] — X4 — n; VE (aj). 

Here, n; is the so called learning rate at iteration ¿. Choosing the right y in 
every iteration of the algorithm is crucial for the success and the convergence 
speed of the optimization. If 7 is too small, it takes many iterations to find 
a local minimum. Furthermore, the detected local minimum might just be a 
plateau with better local minima in the neighborhood. If the chosen learning 
rate is too big, it is possible to jump repeatedly over the local minimum, 
but never reaching it. In severe cases, it is even possible that the algorithm 
diverges. T here are several schemes for choosing the learning rate adaptively. 
Resulting in most cases in a computational overhead, which is due to an 
additional analysis step at the current point of the function. A fixed learning 
rate is often used, which is scaled down in every iteration. Later iterations 
are supposed to be close to a minimum and require therefore a finer grained 
learning rate. 

Given the basic gradient descent update rule, the term 7;V F(x;) can be 
called update v; of iteration ¿. Since these updates only rely on the current 
gradient, small bumps in the error function might lead to a jittering path in 
the gradient descent, which increases the number of iterations until a local 
minimum is found. This might especially occur in stochastic gradient descent, 
which does not use every training sample in each iteration. To overcome this, 
many learning schemes extend the update rule by a momentum term. The 
update rule is then defined by 


Tizi = Ti — VE) up) 
with a new definition for the update »;: 
v; =mVF(2;) + uvi- and 1 =0. 


The parameter u € R(u > 0) denotes the influence of the update from the 
previous iteration. If u = 0, no momentum is used to calculate the current 
update. U pdate steps are stabilized and the “velocity” in flat valleys of the 
error function is increased by using a momentum. However, this property is 
not always desired in all gradient descent schemes, because the momentum 
might also cause the update to overshoot. Hence, the momentum term should 
be used with care. 

In a learning environment, a point x of the cost function is the set of 
internal parameters unified with the expected net output. Since there is not 
only one training example but many, there are also many expected output 
points. The cost of more than one data point is therefore the sum of all costs. 
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This is called objective function. It follows, that in an iteration (epoch) of the 
gradient descent algorithm, all data points need to be processed. This is called 
batch gradient descent. In many cases though, processing all data points in one 
epoch is not feasible because of the size of the dataset. In this case, stochastic 
gradient descent is used. Instead of predicting all data points per epoch, a 
random subset for each epoch is generated. If the subsampling is random 
enough in each epoch, this method optimizes an approximation of the objective 
function. Though each individual epoch might not sufficiently approximate 
the objective function, the repeated random sampling does. Stochastic gradient 
descent is therefore a common approach to train a neural network with big 
datasets. 


6.3.3 Convolutional Neural Networks 


A Convolutional Neural Network (CNN) is an extension to the common 
M LP, originally designed for two-dimensional data, like images. As the name 
suggests, it adds convolutional layers to the set of possible layers in an MLP. 
There is an analogy here with the primary visual cortex of a cat, which 
also uses convolution-like simple cells to extract information from spatially 
close overlapping regions of the field of view [23]. In [24], the authors 
showed that the backpropagation algorithm can be extended for the training of 
CNNs by introducing an update and backpropagation rule for convolutional 
layers. 


Convolutional Layer 


The convolution layer differs in two ways from the common fully connected 
layer of an MLP: 


1. Convolution layers only sum up a fixed window of the input signal. They 
are therefore only locally connected. This connection window is called 
receptive field of the layer. 

2. Each possible position of a receptive field uses the same weights to 
produce an output. This is called weight sharing. 


The output signal is produced in a sliding window fashion, by applying a 
weighted summation of the receptive field for each possible receptive field 
position. The output contains as many values as possible positions. Itis exactly 
aconvolution of the input signal, where the layer weights form the convolution 
filter (kernel). A convolution layer can have several filters, thus forming a filter 
bank, which is analogous to the amount of hidden units in this layer. 
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Pooling Layer 


Another important extension of the M LP is the pooling layer. A pooling layer 
performs a subsampling of the input signal, by “combining” small windows 
of the input signal into several singular values. A common pooling function 
is max-pooling, which calculates the maximum of its receptive field. A nother 
pooling function is average-pooling which computes the average value in 
its receptive field. A pooling can be seen as a convolution with a special 
function and a stride that equals the filter size of the pooling kernel. Regular 
convolutions have a stride of 1, meaning every pixel position is computed 
in the convolution. A stride of 2 means that every other pixel position is 
computed. The purpose of pooling is not only to reduce the spatial size of the 
input signal, but also to increase the robustness of translational invariance of 
the activations. 


Multiscale CNN 


A variation of convolutional neural networks is the M ultiscale CNN. Instead 
of processing an input signal as it is, the M ultiscale CNN processes several 
scaled down versions of the signal simultaneously. T his approach increases the 
ability to extract scale invariant features, without the need to increase the size 
for the extracted pixel neighborhood patch windows. The extracted feature 
maps of each scale are finally combined to produce a joint feature map. This 
can be done by a fully connected layer that takes all feature maps as an input 
to compute its output. For the Scene Labeling application, an image pyramid 
has to be created prior to the extraction of image patches for each scale, which 
are then fed to the M ultiscale CNN. 


Patch Based and Image Based Application 


Neural networks for image classification tasks were traditionally designed so 
that they process a complete image of fixed size and produce classification 
results of a fixed size as well. Big image sizes automatically implied that 
the fully connected hidden layers had also a great amount of hidden units. 
This resulted in the reduction of the input images sizes to keep the neural 
networks scalable and computable. In order to apply neural networks in a 
pixel classification scheme, image patches had to be extracted at each pixel 
position that needs to be classified. |n many cases, these extractions are applied 
sparsely across the image to produce a coarse pixel classification. 

A patch based application of CNNs for pixel classification tasks is com- 
putationally very inefficient, because image patches for neighboring pixels 
overlap. Therefore, the same convolutions are computed multiple times. 
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This redundancy can be omitted by applying CNNs in an image-based 
fashion. This has an effect on several aforementioned components of the 
neural network, since they have been designed in regard to a patched based 
application. The fully connected layer especially is not applicable in an image 
based application, because full connectivity is contrary to thelocal connectivity 
of the convolution layers for arbitrary image sizes. The adequate translation 
of a fully connected layer in a patch based approach is actually another 
convolution layer, with a 1x1 convolution on all locally connected input 
values. 

Another layer type that works differently in an image based application 
is the pooling layer. A naive translation would result in a huge loss of output 
resolution, since pooling layers in patch based mode are designed to subsample 
the input signal. A patch based application on every possible pixel location 
though doesn’t share this subsampling property. This is why the patched 
based approach really evaluates every pixel location, while an image based 
approach implicitly only fully evaluates a subset of all pixel location due to the 
subsampling. To remove the subsampling property, a pooling must be applied 
in a convolutional manner (overlapping pooling). Looking at the output maps 
of such an overlapping pooling, itis clear, that they differ from maps of a non- 
overlapping pooling. In particular, neighboring pixels from a non-overlapping 
pooling are not neighbors anymore. If a convolution layer follows, it results 
in a wrong calculation of the output maps. This can be corrected by reordering 
the pixels after the pooling layer into » subimages, where n is the size of the 
pooling kernel or the stride, and apply the following layers on each subimage 
independently [25]. The reordering is hence defined as fragmentation, because 
the input map is fragmented into smaller output maps. Figure 6.3 shows such 
a fragmentation after the application of a 2 x 2 pooling producing 2 x 2 


subimages. 
pona 
ragmentation gaj | 


Figure 6.3 Example of a fragmentation after a 2 x 2 pooling. The naïve approach would 
only produce the bright pixels, while an overlapping pooling produces all other possible pixels 
(purple, green, and blue). These pixels must be reordered to be able to correctly continue with 
the forward propagation of the neural network. 


Pooling 2x2 
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For Multiscale CNNs, an image based application introduces another 
difficulty, which needs to be solved. In a patch based approach, the image 
patches for each scale have to be extracted and each patch has the same 
size. In an image based approach however, the feature maps for different 
scales are of different size. This becomes a challenge in the fully connected 
layer, which combines the feature maps of all scales. Since there are no fully 
connected layers in the image based approach, the feature maps of each scale 
need to be transformed so that a regular convolution layer can handle them. 
The simplest solution is to scale the smaller maps up so that they all match 
in size. If the maps have been fragmented because of a pooling layer, they 
need to be defragmented before they are scaled up. Defragmentation is the 
reverse function of fragmentation, turning multiple smaller maps into one 
bigger map. 


6.4 CNN for Scene Labeling 


There are many ways to perform Scene Labeling on images. CNNs have 
proven themselves useful on this task, because they achieve state of the art 
performance without the need to develop complex multi cue frameworks that 
combine different inputs and sensors. Additionally, many frameworks for 
modeling, training and execution of CN Ns exist, e.g., Caffe [26], Torch7 [27], 
Theano [28], Pylearn2 which is built on top of Theano, and cuda-convnet 
[29]. These frameworks exploit the CNN’s parallelizability to provide fast 
and time efficient implementations using General Purpose GPUs (GPGPU). 
Furthermore, the research community is actively training and publishing mod- 
els, which can often be adapted to a specific task by resuming thetraining with 
corresponding data. M ostfrequently used models areA lexN et [30], GoogleN et 
[31] or VGG [32]. They differ in complexity and run time efficiency, but 
reached state of the art performance during their time of publishing for certain 
challenges on datasets like | mageN et [33]. A high network capacity is needed 
to achieve a high accuracy on such complex tasks. So the trained models are 
rather big and need a huge amount of computational power. Incorporating 
this into an embedded system with low power consumption, as is needed for 
ADAS, is still a great challenge. 

The following section describes one possible model with reduced com- 
plexity, selected for implementation in the course of the DESERVE project. 
Its purpose isto detect the road, vehicles and vulnerable road users, which can 
then be utilized for lane prediction and pedestrian detection. 
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6.4.1 Exemplary Network for Scene Labeling 


The proposed model is derived from the Multiscale CNN used in [34]. It 
consists of 2 convolutional layers and 2 pooling layers. The activation function, 
used after the convolutional layers, is the ReLU function (see Figure 6.2). 
Each convolution layer contains a bank of 16 x (7 x 7) filter kernels. These 
four layers are applied on three scales of the input image and combined by a 
fully connected layer, producing 6 output channels: background, road, vehicle 
(including cars, trucks, busses, ...), vru (vulnerable road users: pedestrians, 
cyclists, ...), sky and infrastructure (buildings, signs, barriers, traffic lights, 
...). Those channels are normalized by a softmax layer to produce class 
probability maps for each class. By applying an argmax on these maps a 
class membership map is produced returning the most probable class for each 
pixel. The input images are preprocessed by transforming them into an image 
pyramid and locally normalizing them afterwards to zero mean unit variance 
in a 15 x 15 neighborhood. Figure 6.4 shows the complete toolchain and 
Figure 6.5 the network topology in more detail. 


6.4.2 Evaluation 


The topology described in subsection 6.4.1 was trained with 6895 labeled 
night time images of a near infrared camera used in the NV3 night vision 
system of a M ercedes Benz S-Class. The images show mainly rural, but also 
urban, road scenes under different weather conditions and different seasons. 
To augment the heavily under-represented vru class, 15174 images are added 
to the aforementioned set of images, where only the pedestrian and cyclist 
labels are used. This is called the learn set. The training scheme is stochastic 
gradient descent with the logistic regression objective function for 6 classes. 


Class membership 
probability maps 


Input from Preprocessing Application of the multiscale CNN Pixel classification 
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Figure 6.4 The complete processing chain from input image to a scene labeled image is 
displayed. A fter building an image pyramid of 3 layers and the local normalization every scale 
is fed to its own processing chain. This produces 6 class membership probability maps. They 
can be interpreted and augmented as seen in the output image. 
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Figure 6.5 The image pyramid construction layer produces 3 scales that are locally nor- 
malized in 15 x 15 windows. Every scale is propagated independently. There are in total 2 
convolution layers with 16 x 7 x 7 filter kernels using the ReLU activation function. A fter 
activation a 2 x 2 max-pooling is performed followed by a fragmentation in the first pooling 
layer. A second fragmentation is not necessary since the second pooling layer is followed by 
a defragmentation. The small scaled feature maps are sampled up and fed to a classification 
layer, being a 6 x 1 x 1 convolution layer. Finally, a pixel wise softmax is applied. 


It is trained 10.000 epochs with 40960 balanced training examples (patches) 
per epoch. The learning rate was determined following several short runs of 
100 epochs with different learning rates. The best progressing learning rate 
was then chosen. During training, the learning rate was linearly reduced after 
5000 epochs by a factor of 0.995 per epoch. Figure 6.6 shows the training 
progress (2-2-16 topology) in relation to the objective function on the learn 
set. Two other topologies were also trained in the same way. One introduced 
a third convolution layer including the ReLU activation function after the 
second pooling (3-2-16 topology). The third topology is similar to the 3-2-16 
topology, but uses 32 filters per convolution (3-2-32 topology). Figure 6.6 
shows that the topology with the least trainable parameters (2-2-16 topology) 
performed worst during training. The introduction of another convolution 
layer (3-2-16 topology) resulted in a better learn curve. However, doubling 
the amount of filters (3-2-32 topology) increased the learn performance yet 
again. 

Since the classifier of topology 3-2-32 appears to have the best perfor- 
mance, it is evaluated on the evaluation set of images containing 200 images 
that have not been part of the learn set, called the eval set. Evaluation in 
multiclass problems is done by analyzing the confusion matrix. The confusion 
matrix for topology 3-2-32 is displayed in Table 6.1. It shows the class 
predictions in relation to the actual class. The diagonal entries form the true 
positives (pixels that were classified correctly, TP) for each class, while the 
remaining entries of a line or column display the individual false negatives 
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Figure 6.6 Displayed are the learn curves of three different network topologies. Each 
topology was trained three times and the learn curves were averaged. The averaged learn 
curves are displayed as solid lines while the standard deviation for 50 epochs is displayed as 
the area around the lines. 


Table6.1 Theconfusion matrix of topology 3-2-32 and the respective FNR, FPR and IU for 
each class. The classes are background (B g), road (Rd), vehicle (Veh), sky, vulnerable road 
users (V RU) and infrastructure (Inf). Each cell shows the percentage (from all pixels in the 
dataset) of actual class (row) predicted as class (column) 


Ad VPE Bg Rd Veh Sky VRU Inf 

Bg 24.9349 1.9400 11226 2.1282 0.3359 5.8754 
Rd 1.5685 29.4059? 1.0226 0.0034 0.1269 0.3226 
Veh 0.1042 0.0829 3.6523? 0.0051 0.1156 0.7749 
Sky 1.7298 | 0.0080 0.1744 7.1476? 0.0083 0.9632 
VRU 0.0058 0.0032 0.0740 0.0001 0.0733? 0.07775 
Inf 1.6244 0.0459 1.0077 0.3351 0.3538 12.8450? 
FNR 31.38 9.38 2287! 28.75 68.68^ 20.77 
FPR 16.79 6.01 48.225 250 92.77° 38.42 
IU 60.27 85.16 44.899 57.17 6.24° 53.02 


(pixels not classified as the desired class, FN ) and false positives (pixels falsely 
classified as the desired class, FP). Therefore, the sum over one row of the 
table gives the percentage of the respective class in the whole training set. 
The quality measures of binary classification problems can therefore be 
applied for each class individually in a “one versus all” fashion. Classic 
measures contain the False Negative Rate (FNR), the False Positive Rate 
(FPR) and the Intersection over Union (IU). Those are defined as follows: 
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FN FP TP 
di S Mea IU = TPUFPUFN 
N denotes the number of all pixels evaluated. FNR and FPR are 0, if the 
classification is correct and get bigger, if more pixels are classified incorrectly. 
The IU has a value of 1 in case of a perfect classification and the value gets 
smaller, if more pixels are classified incorrectly. 
Table 6.1 shows the percentage of pixels classified as one of the 6 classes. 
Thelast3 rows display theclass-wiseFNR, FPR andIU. Theconfusion matrix 
shows several interesting features: 


a. The diagonal entries show the true positives, the correctly classified 
pixels. Sincethe total amount of pixels in the evaluation dataset for each 
class varies, the maximum possible number for each entry varies as well. 

b. For the class vulnerable road users (V RU) the classifier performs badly. 
There are more pixels classified as vehicles (Veh) or infrastructure (Inf) 
than VRUSs, resulting in a bad FNR. Even worse is the FPR, since the 
amount of background (Bg) or infrastructure (Inf) pixels classified as 
VRU is far greater than the amount of correctly classified pixels. This 
results in a bad IU. 

c. The best performing class is the class road (Rd). It has comparatively 
few false positives and negatives, which results in a good FNR, FPR and 
IU. 

d. The class vehicle (Veh) shows an arbitrary performance. Though the FNR 
is quite good and better than the class background (Bg), its FPR is second 
to last. So the IU is greatly affected. 


After analyzing each class by itself the question arises of how good this clas- 
sifier is Compared to classifiers, which contain other well and bad performing 
classes. A common measure to describe the overall performance of a classifier 
is the accuracy (ACC). It is the ratio of correctly classified pixels to all pixels. 
Let N be the amount of classes and C; ; be the amount of pixels from class i 
classified as class j. In a multiclass setup, the accuracy can then be defined as: 


This measure captures in a straight forward way the correctness of a classifier. 
The value is in the range [0, 1], where a perfect classifier reaches 1. If 
one or more classes are under-represented in the evaluation dataset, the 
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expressiveness of this measure suffers, since it does not normalize the amount 
of samples per class. Other ways to increase the sensitivity to underperforming 
classes is to average the FNR, FPR or IU over the classes. The Matthews 
Correlation Coefficient (MCC) was designed for binary classifications and 
computes a correlation between the actual and predicted classifications. It 
was extended to incorporate more than two classes and is defined by [35] as 
follows: 
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The Matthews Correlation Coefficient is in the range [-1, 1]. An MCC of 1 
is a perfect classifier, while —1 is the total contradiction. An MCC of 0 isa 
random classifier. Table 6.2 shows the ACC, mean IU, M CC and mean FNR 
for the classifiers trained in Figure 6.6. It can be seen that topology 3-2-32 
outperforms the topologies in all defined measures. 


6.5 Hardware Platforms for Scene Labeling 


Embedded hardware platforms for A dvanced D river A ssistance Systems face 
several challenges. T hey haveto provide a huge amountof processing power to 
keep up with the rising complexity of applications and the increasing amount 
of data they have to process. However, the platforms should have low power 
consumption. 

At one end of the spectrum of hardware architectures, G eneral P urpose 
Processors (GPPs) usually do notfulfill all the requirements and restrictions of 
embedded systems in advanced driver assistance systems. They offer a high 
degree of flexibility due to the arbitrary programmability, but they cannot 
usually comply with the high demand on processing power while holding the 
restrictions in power consumption. 


Table 6.2 Displayed are the measures Accuracy (ACC), mean Intersection over Union 
(MIU), Matthews Correlation Coefficient (M CC) and mean False Negative Rate (mFNR) for 
3 topologies 


Topology ACC mlU MCC mFNR 
2-2-16 0.60 0.35 0.50 0.44 
3-2-16 0.69 0.42 0.60 0.37 
3-2-32 078 0.51 071 0.30 
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At the other end of the spectrum, Application Specific Integrated Circuits 
(A SICs) provide a high degree of processing power and excellent power 
efficiency. However, they are not flexible as they are fixed after manufacturing 
and cannot be programmed. 

There is a wide range of hardware platforms in between these two 
extremes, which provide a trade-off between the different characteristics. For 
example, Graphical Processing U nits (GPUs) have been used to acceleratethe 
execution of complex algorithms. They provide a certain degree of flexibility, 
as they are programmable and they achieve high processing power due to a 
high degree of parallelism. However, the power consumption of GPUsis fairly 
high and they are therefore not suitable for use in personal cars. 

Adapting processor architectures to a given application is a promising 
approach for designing hardware platforms. Application-Specific Instruction- 
Set Processors (ASIPs) are based on programmable processor architectures. 
These are adapted to a specific application or a class of similar applications, 
e.g., by extending the instruction set, by adding dedicated hardware acceler- 
ators for frequently used operations, or by changing architectural parameters 
in order to bypass bottlenecks. 

Scene labeling has been implemented on several platforms including 
CPUs, GPUs, FPGAs, and ASICs. This section gives an overview of recent 
implementations of convolutional neural networks on different types of 
computing platforms. At first, the computational complexity of convolutional 
neural networks is discussed, by deriving a measure of the total number of 
operations needed in order to compute the forward propagation of one frame 
through the network. This also serves as a basis for the comparison of different 
implementations, which is presented later. 


6.5.1 Theoretical Performance Requirements 


This section describes the computational complexity of convolutional neural 
networks in terms of operations needed in the forward propagation of a frame. 
This number of operations clearly depends on the topology of the network. 

The most computational intensive task is the convolution, especially, as 
many convolution layers contain a huge number of filters. For an input image 
of size w x h and a convolution kernel of size n x n, the kernel is applied 
(w—(n—1))(h—(n—1)) times. Each time, n? multiplications are performed 
and the results accumulated. Counting the multiply and accumulate operations 
as two, this leads to a total count of 
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Neonv(w, h, n) = 2(w — (n — 1))(h — (n 1))n? 


operations for a single convolution. 

The activation function is applied to each output pixel of the input layer. 
Therefore, the total number of operations for an input image of size w x his 
given as 


Nact(w, h, Cact) = WhCact, 


where cac+ describes the cost of applying the activation function to one pixel. 
In case of the ReLU (Rectified Linear Unit), the operation determines the 
maximum of the input value and 0. Therefore, crezy = 1. 

For the pooling layer, the number of operations depends not only on the 
size w x h of the input frame, but also on the kernel size n x n and the stride s. 
In some cases, the stride equals the kernel size, but in overlapped pooling, a 
stride of 1 might be used. In general, the number of operations performed in 
a pooling layer can be described as 


wh+(s—n)((s—n)+w+h) 
E 


Npooa lw, h, n, s) = Cpool ; 
where cpo is the number of operations per pooling window. For a max- 
pooling, the number of operations is Cmax = n? — 1, for an average-pooling, 
the number of operations is cayz = n? + 1. 

For the exemplary convolutional neural network described in 
subsection 6.4.1, which is named 2-2-16 in Table 6.2, the following remarks 
givethe numbers of operations for the single layers. The image preprocessing, 
i.e., the construction of the image pyramid and the normalization, is not 
counted in this section. 

In this exemplary case, the input image has 1024 x 512 pixels. In the 
preprocessing step, an image pyramid is generated by an iterative process. In 
each iteration, the image dimensions are halved by subsampling. A fterwards, 
the three scaled images from the pyramid are padded by replicating the border 
pixels in order to maintain the correct output size after the convolutions. The 
resulting image sizes are listed in Table 6.3. 

The first convolution layer performs 16 convolutions with a 7 x 7 kernel 
and generates 16 output images. The convolution is only performed for pixels 
where the convolution kernel fits into the input image, so that the resulting 
image is reduced by 6 pixels in width and height. The convolution layer is 
followed by an activation layer, which applies the activation function to each 
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Table 6.3 Input image sizes for three different scales in the exemplary convolutional neural 


network 
Scale Pyramid Output Padded 
S 512 x 256 534 x 278 
M 256 x 128 278 x 150 
L 128 x 64 150 x 86 


of the 16 output images of the convolutions. Thefollowing max-pooling layer 
Uses a 2 x 2 patch and astrideof 1 (overlapped pooling). It does not changethe 
total number of pixels but separates one image into four sub images of quarter 
size. The fragmentation of the images does not contribute to the number of 
operations since it can be hidden in the other layers. The second convolution 
layer performs 16 convolutions of size 7 x 7 on each of the 16 fragmented 
images and then accumulates them to 16 fragmented output images. The 
following activation function and pooling layers work the same as after the 
first convolution layer. 

This flow of images through two convolution layers with activation 
functions and two pooling layers is performed independently for the three 
scales of the input image. The resulting images are scaled to the same size 
before they are fed into the classification layer. 

The classification layer at the end performs one convolution of size 1 x 1 
per output class, of which there are six in the exemplary convolutional neural 
network. 

With these image and filter sizes, the computational complexity of the 
convolutional neural network can be estimated using the equations above. 
Table 6.4 gives the operation counts for the three scales by layer type. 

The total number of operations performed for one input image is 
4.796.792.784. As expected, the convolution layers contribute the biggest 
share in the number of operations, with a proportion of 99.2 percent. In order 
to reach a processing rate of 30 frames per second, 144 billion operations have 
to be performed per second. 


Table 6.4 Number of operations for the exemplary convolutional neural network 


Scale Convolution Activation Pooling Classif. Operations 
S 3.590.995.968 4.444.416 13.220.592 12.582.912 3.621.243.888 
M 922.435.584 1.175.808 3.470.064 3.145.728 930.227.184 
L 243.253.248 327.936 954.096 786.432 245.321.712 


Ops. 4.756.684.800 5.948.160 17.644.752 16.515.072 4.796.792.784 
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Table 6.5 listsimplementations of convolutional neural networks on differ- 
ent platforms and gives the performance in terms of performed operations per 
second. W hen available, two numbers are given for each implementation. The 
peak performance gives the theoretical maximum number of operations per 
second that the platform can perform. Thereal performance gives the number 
of operations per second for CN N s of different topologies on the platform. N ot 
all implementations listed in thetable are used for scene labeling, but perform 
other image based detection and classification tasks with convolutional neural 
networks. Therefore, the networks that are used in the applications may differ 
in size. This is mentioned, because some implementations do not scale up 
to bigger networks easily. The subsequent sections give more details to the 
entries in the table. 


Table 6.5 Comparison of different implementations of convolutional neural networks on 
different platforms 


Perf. [GOPs] 
A uthor Year Device Peak R eal 
CPU Implementations 
Farabet et al. [39] 2011 Intel Core 2 Duo 10 1,1 
Dundar et al. [40] 2013 Intel Core i7 4-core 200 90 
Jin et al. [41] 2014 Intel Core i5 45 30 
Zhang et al. [42] 2015 Intel Xeon - 12.87 
GPU Implementations 
Farabet et al. [39] 2011 nVidia GTX 480 1350 294 
Dundar et al. [40] 2013 nVidia GTX 780 3977 620 
Jin et al. [41] 2014 nVidia GTX 690 5622 530 
Cavigelli et al. [43] 2015 nVidia GTX 780 3977 1781 
Mobile GPU Implementations 
Farabet et al. [39] 2011 nVidia GT335m 182 54 
Dundar et al. [40] 2013 nVidia GTX 650m 182 54 
Cavigelli et al. [43] 2015 nVidia Tegra K 1 326 76 
FPGA Implementations 
Farabet et al. [39] 2011 Virtex 6 V LX 240T 160 147 
Dundar et al. [40] 2013 Zync ZC 706 - 36 
Gokhale et al. [44] 2014 Zync ZC 706 - 227 
Zhang et al. [42] 2015 Virtex 7 485t - 61.62 
ASIC Implementations 
Pham et al. [45] 2012 neuFlow in IBM 45 nm 320 294 
Chen et al. [46] 2015 Accelerator in 65 nm - 452 


Cavigelli et al. [47] 2015 Accelerator in 65 nm 274 203 
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6.5.2 CPU-based Platforms 


As discussed before, running convolutional neural networks for scene labeling 
or other image processing tasks incorporates a huge amount of computation. 
For the use in ADAS, CPUs cannot provide the necessary processing power 
while also complying to the power budget restrictions. Active work is per- 
formed in order to speed up the implementations (e.g., [36]). Also, algorithmic 
research is conducted in order to speed up the convolutions, e.g., [37, 38]. 

A reference implementation of the exemplary CNN from subsection 6.4.1 
was written using C ++. Itis worth mentioning that the focus in this implemen- 
tation was not speed or efficiency. Instead, it was intended as a reference for the 
assembler implementation described later. The implementations of the image 
processing operations and the different layers of the convolutional neural 
network make use of templates. This provides the flexibility to use different 
data types for the pixel values and coefficients. The templates enabled the use 
of fixed-point data types in order to analyze the compromise of data width and 
accuracy. 

On an Intel Core i5-2400 with 3.1 GHz, the computations for one input 
image of size 1024 x 512 with double precision values and coefficients 
require about 11 seconds, which corresponds to about 436 MOPS. This 
implementation does not use multiple cores for computation. 


6.5.3 GPU-based Platforms 


Modern GPUs provide a huge amount of computing power that can be used 
for general purpose computing (GPGPU). The use of GPUs is most beneficial, 
if the application provides a high degree of parallelism and regularity. CNNs 
fall into this category. Therefore, most deep learning frameworks mentioned 
in the previous section accelerate evaluation and training of networks with 
GPUsusing CUDA, and there are also frameworks specifically developed for 
GPUs, e.g., cuda-convnet2 [29] and M arvin [48]. 

A downside of using the powerful GPUs is the amount of power they 
consume, which makes the use of GPUs in mobile devices infeasible. N ever- 
theless, GPUs can be used for training the networks, as the training is 
performed offline. Recently, mobile or embedded GPU s have emerged, aiming 
to provide low-power high-performance computing platforms. 


6.5.4 FPGA-based Platforms 


A FPGA, a configurable hardware platform, provides a compromise between 
the flexibility of a GPU and the efficiency of an ASIC. The high degree of 


126 Deep Learning for Advanced Driver Assistance Systems 


parallelism that is possible in a FPGA, allows for high performance signal 
processing. As double precision arithmetic is costly for a hardware-based 
implementation, the C ++implementation of the algorithm was used to analyze 
the quality of the classification depending on the data width of pixel values 
and coefficients. For 32-bit data with 22 fractional bits, the computations are 
exact and no errors appear. If 16-bit data with 11 fractional bits are used, about 
1.4 percent of the pixels are classified incorrectly, which was acceptable in 
this scenario. 

The use of a soft core processor that is mapped to the FPGA also provides 
software programmability of the design. In order to raise the computational 
performance, the soft core processor can be extended with dedicated hardware 
modules (application-specific instruction-set processor, A SIP). For example, 
the instruction-set can be extended by new functional units for complex 
operations which are placed in the processor's pipeline and perform as quick 
as the default operations. A dditionally, more complex operations taking more 
execution cycles can be added as external accelerators tightly coupled with 
the processor's data path. 

In the course of the DESERVE project, an ASIP implementation for 
convolutional neural networks has been developed. It is based on the TUKU- 
TURI processor [49, 50], which was developed for image processing and 
video coding implementations. It is a Very Long Instruction Word (VLIW) 
processor with two issue slots and 64 bit wide registers that can be split 
up into subwords of 8, 16, 32, or 64 bits. These subwords are processed 
in parallel (microSIM D) by all default functional units. A dditional features 
include conditional execution in order to reduce control overhead, and a DM A 
controller for memory transfer between external and internal memory. 

As derived from the CPU-based reference implementation (see 
subsection 6.5.2), 16 bitwidedatais used for the pixel values and the network's 
coefficients. Therefore, the SIM D-feature can be used to process four values 
in parallel, which gives a significant speed-up. 

As seen in subsection 6.5.1, the convolution is the most computing 
intensive task in the whole process. Therefore, the T UK UTURI processor was 
extended with a co-processor that performs 16 convolutions of four pixels at 
once. 

Theinternal memory of the TUK UTURI is not capable of holding a whole 
input image. Therefore, the images are processed in blocks. The DMA module 
supports block transfers, so that a rectangular subsection of the image can be 
transferred between internal and external memory. The module holds a queue 
of memory transfers, which are processed independently from theT UK UTURI 
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processor. This allows the TUKUTURI to program several transfers and 
process data blocks transferred previously, while the DMA transfers the next 
blocks in the background. 

The first implementation of the exemplary convolutional neural network 
on the TUKUTURI processor processed one input frame in about 1.2 x 10% 
cycles. With a clock frequency of 100 MHz, this corresponds to about 0.08 
fps. Using the convolution co-processor, the cycle count could be reduced to 
about 243 x 10° cycles, corresponding to a frame rate of about 0.411 fps. This 
is a speed-up of factor 5.1. Using the capabilities for background transfers, 
the total cycle count was reduced to about 101 x 10° cycles per frame, which 
is an additional speed-up of factor 2.4, leading to about 0.99 fps. According 
to Table 6.4, we need about 4.8 x 10” operations per frame. Therefore, this 
implementation reaches about 4.8 GOPs. 


6.6 Summary 


Convolutional neural networks and methods of deep learning have been used 
in image processing, segmentation and classification tasks successfully. The 
huge amount of processing power needed for CNN ss for Scene Labeling tasks 
in advanced driver assistance systems combined with the resource restrictions 
in embedded systems pose a challenge for hardware architects. FPGA s have 
been shown as a suitable platform for the implementation of CNN s for Scene 
Labeling. 
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7.1 Introduction 


The progress in resolution of automotive radar sensors involves a considerable 
increase in data-rate and computational throughput. Dedicated processing 
architectures have to be investigated in order to manage the tremendous 
amount of data. Even for early prototype development platforms, the per- 
formance of existing PC-based frameworks and tools is no longer sufficient 
to cope with the data processing of many parallel radar receiver channels at 
very high sampling rates. 

This chapter presents a F PGA -based signal processing architecture capable 
of handling 16 parallel MIMO radar receiving channels with a sampling 
frequency of 250 M Hz each. Raw data is transferred from the A D-Converters 
to the FPGA where subsequent processing steps are performed, involving FIR- 
filtering and decimation, two-dimensional FFT transform, local noise level 
estimation and subsequent target detection. An external DRAM ¡s used for 
storing multiple radar measurements which are finally evaluated altogether 
(so-called chirp-sequence modulation). 

Data post-processing is outsourced onto a PC running with ADTF, an 
automotive framework for graph-based real -time data processing. The combi- 
nation of afast, FPGA -based preprocessing unitwith a moreflexible, PC-based 
development platform maximizes processing performance and minimizes 
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development time. T he less mature angular M IM O processing algorithms can 
thus be evaluated with the help of C-based algorithms running inA DTF, while 
the simple, but calculation intensive FFT processing is implemented entirely 
as a hardware accelerator in a Virtex-7 FPGA device from X ilinx. 


7.2 Signal Processing for Automotive Radar Sensors 


After A D-conversion, the raw radar signals enter the processing unit, consecu- 
tively passing through all necessary signal processing steps. Different levels 
of data abstraction and representation can be identified, which rangefrom low 
level time signals up to complex environmental models. 

In this chapter, only the extraction of discrete scattering centers will be 
considered. The result is a list of reflections, each having multiple features, 
like for instance Cartesian coordinates, radar-cross-section (RCS), relative 
velocity or signal-to-noise ratio. F urther processing of these reflections would 
incorporate clustering, classification and environment modeling. 

An intermediate state is the extraction of relevant targets from the two- 
dimensional frequency spectrum (cf. Subsection 7.2.2). At this point, the 
range and velocity of the targets have already been determined, while 
the angular information is not evaluated yet. Nevertheless, the data rates 
are already reduced by a significant amount, so that at this stage the data 
transfer interface between FPGA and PC-based signal processing can be 
established. 


7.2.1 FMCW Radar System Architecture 


The usage of frequency-modulated continuous-wave (FM CW) radar sensors 
can be advantageous in short range applications, especially due to their high 
range resolution capability and much lower peak power requirements. In 
contrast to a pulsed radar system, the transmitter and receiver operate at the 
sametime, which imposes some constraints on the transmitted signals. | n order 
to measure the time-of-flight, i.e. the range towards an object, some kind of 
time-varying information needs to be added to the transmitted waveforms. 
The signal has to be modulated in an unambiguous, non-repetitive fashion. A 
constant sine wave, for instance, can't be used for range estimation, due to its 
ambiguity after the phase has increased by one cycle or 27, respectively. 
One widely used modulation scheme consists of linear modulated fre- 
quency chirps (cf. Figure 7.1). Two important parameters are the used 
bandwidth F and the modulation time T which determine the slope 5 of 
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Figure 7.1 FMCW ramp waveform shown as frequency over time f(t). The solid line 
represents the transmitted signal (TX) while the dashed line is the received signal (RX). 


the frequency ramp. Besides, other kinds of modulation schemes exist, e.g. 
frequency shift keying, various phase modulation or pseudo-noise coding 
principles. 

In the case of a linear frequency modulation, the time-of-flight At can be 
directly translated into a frequency difference (so called beat frequency f;). 
With the help of a mixer device in the receiver, this frequency difference can 
be measured efficiently and estimated by subsequent signal processing blocks. 
Finally, the target range r can be obtained from the estimated beat frequency 
value. However, as moving targets engender an additional frequency shift fa 
(Doppler frequency), the measured frequency will consist of a superposition 
of a range and a velocity dependent component. 


2rF 2v, 
Rs is nee 


With the help of advanced modulation waveforms, the occurrence of range- 
Doppler ambiguities can be significantly reduced, while being ableto estimate 
both frequency components individually at the same time [1]. This can be 
achieved by using multiple, aligned FM CW chirps. Furthermore, these ramp 
signals should have a very steep slope, so that the range dependent frequency 
part f, dominates in the beat frequency fp. For a sufficient small target 
velocity, the Doppler frequency fa is likewise small enough so that the range 
esti mation can be carried out directly from fẹ by simply neglecting the minor 
fa contribution. However, the Doppler information is not completely lost and 
can be regained from the inherent phase measurement which is present in 
the consecutive frequency ramps. For this purpose, it is necessary that the 
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ramp sequence is strictly aligned and that the data sampling occurs always at 
the same time instant w.r.t. the chirp modulation. The underlying processing 
technique is shown in Figure 7.2 and relies on a two-dimensional spectrum 
analysis. The big advantage is the unambiguous determination of both the 
range and velocity frequency component of each target. 

For the angle estimation, two different measurement principles can be 
used. One possibility is a steerable antenna, which has a high directivity. 
Only targets which reside inside the antenna beam will contribute to the 
recei ved signal in a significant manner. The detection space has to be scanned 
individually, i.e. each possible direction of arrival (DOA) will be measured 
separately. An alternative to a mechanical steered antenna is the use of an 
antenna array, where each antenna element is fed by a time delayed version 
of the transmit signal. The phase shift of the antenna feeds can be changed 
electronically. Depending on the phase relationships of the antenna elements, 
the directivity can be swiveled, which is also referred to as electronic beam 
steering or phased array. 

The second class of angle estimation relies on a phase measurement of the 
received signals. Within a static antenna array, the measured phase differences 
will depend on the DOA of the target reflections. This property is exploited 
by many different algorithms in the field of array processing [2]. A major 
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advantage of a fixed antenna array is the simultaneous measurement over a 
wide opening angle. T he region of interest does not haveto be scanned and data 
can be collected in a single, instantaneous snapshot. |n general, the achievable 
angular resolution and separability depends on the number of channels as well 
as on the aperture size of the array. 

In the case of a receiving array, each channel will require a dedicated 
frequency mixer, amplifier and A D-converter, which increases the total cost 
of the system. H ence, the usage of advanced algorithms can be considered in 
order to increase resolution without additional receiving channels [3]. These 
algorithms are often said to achieve a superresolution because they perform 
better than a conventional Bartlett beamscan algorithm (cf. [2], pp. 1142). 
A nother possibility is the usage of multiple transmitting channels (multiple 
input - multiple output - MIM O). A MIM O system has a better efficiency 
because the number of virtual channels is larger than the real number of 
channels, thus resulting in lower hardware effort. 

In Figure 7.3, alinear MIM O antenna array is shown with two transmitter 
antennas, which are depicted as circles on the left. The physical receiving 
array (blue) is extended by several virtual antenna positions. The underlying 
signal processing remains the same as in the single transmitter case, however 
the full virtual array can be used resulting in an increased accuracy and 
object separation capability. In order to separate the signals originating from 
differenttransmitting antennas at the receiver side, some kind of orthogonality 
has to be introduced. A straight forward approach is to use a time-division 
multiplexing (TDM ) approach, i.e. only one transmitter operates at the same 
time. Other possible techniques comprise frequency-division multiplexing 
(FDM ) or code-division multiple access (CDMA). 
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Figure 7.3 Possible MIMO antenna array design: The physical receiver array (blue) is 
extended by several virtual antennas (red squares) due to the second transmitter TX 2. 
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7.2.2 Two-Dimensional Spectrum Analysis for Range 
and Velocity Estimation 


M ulti-target scenarios are usually encountered in automotive radar applica- 
tions. Especially static targets are often present in the field of view arising 
from roadside structures, e.g. guardrails and reflector posts. Furthermore, 
with increased resolution, multiple scattering centers are visible from single 
objects, e.g. the shape of car bodies is seen as alarge cloud consisting of many 
reflections [4]. 

In order to resolve and separate proximate targets, a good range resolution 
and thus frequency resolution is required. One widely used technique provid- 
ing afast and robust frequency estimation is the fast Fourier transform (FFT). 
For a further increase in range resolution, advanced frequency estimation 
algorithms like autoregressive (AR) models or multiple signal classification 
(MUSIC) can be employed [5, 6]. Beside the higher computational require- 
ments, they suffer from the fact that the number of detections needs to be 
known prior to the estimation. For this reason, the presented system relies on 
the more convenient FF T-based spectrum analysis. 

The Doppler frequency estimation is carried out by a second FFT. Instead 
of the raw time signals, the frequency bins of the first FFT are used as input 
signal. In other words, the second FFT measures the ramp-to-ramp phase 
offset for each target. This offset depends solely on the Doppler shift of the 
target, because the radar system ensures a coherent sampling of the transmitted 
frequency chirps. Only if the target is moving relatively to the sensor, the 
measured phase value will vary between the consecutive chirp ramps. 

As depicted in Figure 7.2, targets with different ranges and different 
velocities are separated after this step. In contrast to many other FM CW 
modulation forms, a matching step to find corresponding ranges and velocities 
is no longer required, because the values are directly obtained from the two- 
dimensional indices. Furthermore, the computational effort stays constant and 
is thus independent from the number of prevailing targets. T his property plays 
a key role in scenarios with many scattering points as often encountered with 
high resolution automotive radar sensors. 

A nother benefit of the two-dimensional spectral processing is the higher 
sensitivity. Particularly small targets with a low radar cross-section (RCS) can 
be masked by the noise floor of the first FFT. These targets become visible 
only by the help of the additional processing gain of the second FFT. Thus, 
each output bin of the first FFT shall be taken into account and the full 2D 
matrix should be evaluated before any target detection takes place. 
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7.2.3 Thresholding and Target Detection 


A crucial pointinthesignal processing chain is the separation of different target 
reflections in the two-dimensional power spectrum. With the help of this step, 
data of relevant objects will be isolated from the random noise components. 
This leads to a significant reduction of data rate and thus lowers the com- 
putational performance requirements for the downstream signal processing 
steps. 

The target detection is carried out with the help of an adaptive threshold, 
reducing the effects of local noise and clutter components. With the means 
of a constant false alarm rate (CFAR) processing, the probability of false 
alarm remains constant, irrespective of varying operational and environmental 
conditions. 

Differenttypes of CFAR processors can be used for noise level estimation. 
Two variants are presented in this section, the cell-averaging (CA -CFAR) and 
the ordered-statistic (OS-CFAR), two of the most extensively used variants. 


Cell-Averaging CFAR (CA-CFAR) 
The basic task of a CFAR detector is to provide an adaptive threshold, which 
is then used for the subsequent detection step, ¡.e. the decision if a specific 
cell contains a present target or just irrelevant noise components. In contrast 
to a fixed threshold, an estimate of the local background noise level is used as 
threshold, which has to be obtained automatically and separately for each cell 
under test (CUT). M any different methods exist to provide such an estimate, 
each leading to different classes and variants of CFAR detectors. 

A simple yet powerful approach is the mean value of a number of window 
cells in proximity to the CUT (see Figure 7.4). This variant is known as cell 
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Figure 7.4 CA-CFAR sliding window implementation. 
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averaging CFAR, or CA-CFAR. The assumption made in this case is that all 
window cells contain only noise components and thus the mean value is a 
good estimate of the noise variance. In the case of white Gaussian noise, the 
value is corresponding to the maximum likelihood estimator. However, for 
many radar systems the assumption of normal distributed noise turns out to 
be inaccurate [7]. 

When designing a CFAR detector, an important parameter is the window 
size around the CUT. On the one hand, a larger window size reduces the 
statistical estimation error; on the other hand, local differences in the noise 
level can be blurred by alarge window. A tradeoff has to be made between the 
deviation from the requested false alarm rate due to the estimation error and 
the local sensitivity of the adaptive threshold which results from smoothing. 
Furthermore, the computational effort becomes more relevant with increasing 
window sizes. 


Ordered-Statistic CFAR (OS-CFAR) 
In the case of white Gaussian noise, the CA-CFAR performs very well in 
single target scenarios. However, in a multi-target environment, the estimated 
noise level will deviate due to interfering targets inside the window cells. 
Robust statistics can be used in order to suppress outliers arising from other 
targets inside the window. A commonly used variant is the ordered-statistic 
(OS-CFAR) which relies on asortation of the values inside the window, similar 
to a median filter. 

The algorithm performs the following steps for each cell under test 
(CUT): 


e Sort all cells inside the window by their absolute square value 

e Take out the k-th value of the sorted list. This value serves as an estimate 
for the local noise level 

e Apply a scaling factor to the noise estimate 

e Compare the scaled estimated noise value against the CUT 

e Decide whether the CUT is a valid target 


Especially in the field of high-resolution radar, big window sizes are required, 
because large and widespread targets will easily occupy multiple window cells. 
The complete sortation of the whole window is not a very efficient solution. 
Only a single value of the sorted list is of interest, while all other values are 
discarded. Furthermore, when evaluating neighboring CUTs, the previously 
sorted list can be used as starting point. 
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Several optimizations of the algorithm aim at these specific sortation 
characteristics. For instance, a “k-th maximum search” can be performed 
which finds the greatest value and removes it from the set. T his step is repeated 
until the k-th value has been found [8]. Another efficient realization uses a 
sliding window approach which keeps a sorted list in memory [9]. Now, when 
moving the window one step further, the insertion of a single value requires 
at most N comparisons. 

Besides, if one is only interested in the decision result, the complete 
sortation of the list can be bypassed and the detection step can be per- 
formed in a “rank-only” manner [10]. Therefore, the inverse threshold is 
applied to the CUT and the result is compared to each cell inside the 
window. The binary comparison results, i.e. 1 if the value is bigger - 0 
if not, can be summed up to get a rank. Only if the rank is greater than 
k, the CUT is considered as valid detection. This approach is depicted in 
Figure 7.5. 

In contrast to a complete sortation, this algorithm depends only on 
N comparisons. The complexity is thus linear for growing window sizes. The 
target decision result is exactly the same, i.e. there is no performance loss. 
The only disadvantage is the lack of the k-th value, which is unknown in the 
rank-only case. This value can serve as an estimate for the local noise level 
and can be required by subsequent signal processing blocks. A supplementary 
estimation of this value can be considered, e.g. the mean value of all cells 
which have been classified as noise. 
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Figure 7.5 Rank-only OS-CFAR implementation. 
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Non-Coherent Integration (NCI) 

Even though the detection takes place before the angular processing, the 
data of multiple receiving channels can be used to further improve detection 
performance. An integration of all channels prior to the detection step turns 
out to be beneficial, assuming that the noise components are independent and 
identically distributed (i.i.d.). However, the phase relationship of the signals 
between adjacent channels is not known prior to the angle estimation and can 
take any value. When summing up the complex values of each channel, the 
signals can interfere either constructively or destructively. In order to avoid 
a cancellation of the signal power, the integration takes place in the power 
spectra, which is also known as non-coherent integration (NCI). 

In the following, the noise components are modeled as additive-white 
Gaussian noise which means that a zero-mean normal distributed signal n[t] 
is added to the received signal s[t]. 

It can be shown, that both the real and imaginary parts of the noise 
components follow a zero-mean normal distribution after transformation into 
the frequency space[11].Thevarianceof N [k] depends on the input variance as 
well as on the length of the input signal, i.e. the length of the FFT. W hen taking 
longer signal sequences, the signal-to-noise ratio can be improved (so-called 
processing gain). 


S[k] = S[k] + NIk] 


The power spectrum can be calculated by summing up the squared values of 
real and imaginary part. As a sum of two squared, i.i.d. Gaussian variables, 
it results a chi-squared distribution x?(n) with n = 2 degrees of freedom for 
the squared magnitude |N [k]|?: 


[NUK]? = Nrelk]? + Niamh]? 
INAP ~ x?) 
When summing up multiple receiving channels, i.e. multiple i.i.d. random 


variables, the result will again be chi-squared distributed but with a higher 
degree of freedom. 


m 


Nyerlk] = Y > INI]? ~ x? (2m) 
i—1 
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Figure7.6 Additive white Gaussian noise model. 


In contrast to the FFT, the mean value of the noise power scales linearly 
with the number of channels in the same way the signal power does. Therefore, 
the signal to noise ratio is notimproved. However, the variance is decreasing 
which has an effect on the possibility of false alarm. A n example measurement 
is depicted in Figure 7.7, comparing the noise distribution of one channel and 
the distribution after the integration of 32 channels. It can be observed that for 
the same threshold level, a lower probability of false alarm can be achieved 
due to the lower variance of the blue histogram. The other way round, for the 
same probability of false alarm, a lower threshold level can be used, which 
increases the detection rate. 


7.2.4 Angle Estimation 


In Subsection 7.2.1 the measurement principle of antenna arrays has been 
introduced briefly. In general, the angle estimation is based on the measured 
phase offset 4, between different antenna positions (cf. Figure 7.8). 
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Figure 7.7 Histogram of a noise measurement showing the chi-squared distribution before 
and after NCI. 
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Figure7.8 Uniform linear antenna array with spacing d and resulting steering vector v(a). 


Since the antenna positions are known, a conclusion may be drawn on the 
direction of arrival. For this purpose, the introduction of a steering vector v(a) 
can be useful. This vector contains the expected phase offsets, equivalent to 
an ideal incident signal from a certain angle o: 


v(o) = leii (a) eJb2 (a) ejba(a) T ejénto)] 


In the case of a linear array with N elements, the steering vector is simply 
constructed from the distance d between two antenna elements, the wavelength 
A and the incident angle a. The phase of the first element is normalized to 
zero and the amplitudes are assumed to be all equal one: 


v(o) = [1 cj2r-d sina/A ej2x-2d sina/A eJ2n-(N-1)d sino/A] 


Similar to the spectral estimation, different classes of algorithms can be 
identified. Some procedures like the Bartlett beamformer just calculate a 
weighted sum of the received signal vector x. This is done for each possible 
DOA and results in an angular spectrum: 


P(a) = |x? v(a)? 


The magnitude P represents the correlation between the received signal and 
the steering vector. A subsequent maximum search extracts the esti mated target 
angle. The separation of two targets is also possible by simply extracting 
the two largest peaks, however attention has to be paid to the occurrence of 
sidelobes. Furthermore, the width of the mainlobe determines the separability 
which is often not satisfactory. 
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M ore sophisticated methods to mention are the Capon beamformer, also 
known as minimum variance estimator, which achieves a better angular 
separability. A nother important class is known as subspace based methods, 
incorporating MUSIC and ESPRIT as the most prominent examples. Finally, 
maximum-likelihood estimators exist, which need to know the model order 
in advance, i.e. the number of targets. However, if the targets have already 
been separated by different ranges and velocities, the estimation of the model 
order is feasible because only few targets will be present, in most of the cases 
only one. A comprehensive overview of existing methods and algorithms is 
given in [2]. 


7.3 Hardware Accelerators for MIMO Radar Systems 
7.3.1 Basic Structure of a Streaming Hardware Accelerator 


Figure 7.9 shows the overview of a hardware-accelerator for high-resolution 
MIM O radar sensors. Obviously, a high degree of parallelism can be observed, 
due to the pair wise independence of the receiving channels. Up to the NCI 
step, each data stream ¡s processed for ¡ts own. 

T he spectral analysis is carried out with the help of a FFT, whose efficient 
implementation in streaming applicationsis well understood. A critical step in 
the design process of this block is the specification of the maximum FFT 
lengths, as this parameter determines essentially resource usage. Further- 
more, when using fixed-point arithmetic, the word length and data scaling 
behavior can have major effects on performance and efficiency. This aspect is 
investigated in Subsection 7.3.2. 

Regarding the two-dimensional FFT, a concept for data storage and 
transfer has to be developed. The storage of a complete chirp sequence, 
i.e. a set of K ramps is required in order to perform the second dimension 
FFT processing. This dictates mainly the size of the memory, which grows 
rapidly due to the influence of further key parameters. In general, increasing 
the resolution in range, in velocity or in the angular domain, also increases 
the required memory size. It turns out, that this size exceeds rapidly several 
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Figure7.9 Architecture of a streaming hardware accelerator. 
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M Bytes. Thus, the usage of large DRAMs becomes necessary since the size 
of an on-chip SRAM cache memory is not sufficient anymore. A n analysis for 
different modulation and system parameters can be found in [12]. 

Regarding the throughput of the memory, the addressing scheme affects 
heavily the performance in the case of a DRA M . The row opening and closing 
delays, as well as the read and write transfers can be completely hidden due 
to the streaming nature of the application. The problem of transforming large 
two-dimensional matrices with the help of DRAM s has been investigated in 
[13]. An addressing scheme suitable for the application to chirp-sequence 
processing has been derived in [14]. 

Depending on the type of threshold estimation, the calculation can be a 
simple mean valuein the case of CA -CFA R, butit can also become very costly 
in the case of a sorted list (OS-CFAR). Subsection 7.3.3 presents an efficient 
architecture based on the rank-only OS-CFAR which avoids a complete sorting 
of the values inside the window. 


7.3.2 Pipelined FFT Accelerator 


For streaming applications, pipelined FFT architectures provide a very high 
throughput. The usage of dedicated hardware accelerators is especially useful 
for real-time applications, where a high degree of capacity utilization can be 
achieved. M any differentimplementation forms have been reported in the past 
decades. One important parameter is the used butterfly architecture, which can 
be based on aR adix-2, Radix-4 or Split-R adix decomposition, justto mention a 
few. In practice, multiple butterflies are cascaded to achieve longer transform 
lengths. Another important design decision is the use of a single-path vs. 
multi-path implementation. 

A straight forward implementation of the Cooley and Tukey FFT algorithm 
is shown in Figure 7.10 [15]. It is realized with Radix-2 butterflies which are 
combined in a decimate-in-frequency (DIF) decomposition. This architecture 
can process one sample per clock cycle and needs log N — 1 multipliers. 


Figure 7.10 Radix-2 FFT implementation based on a multi-path delay commutator (M DC) 
pipeline. 
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Furthermore, several buffer memories are required which have the total size 
3N /2. 

When analyzing the data flow, it turns out that the butterflies and the 
multipliers are only used half of the time. Furthermore, only half of the 
memories store valid data at the same time. Several optimizations have been 
proposed in order to increase the utilization of the multipliers and memories. 
Forexample when using feedback networks, the efficiency in terms of memory 
usage can be improved. This class of pipeline architectures is known as 
single-path delay feedback (SDF) network (cf. Figure 7.11) [16]. 

W hen using Radix-4 butterflies, the number of multipliers can be reduced 
as well, at the cost of more complicated butterflies requiring more dedicated 
adders. 

Another FFT algorithm for pipelined implementations has been proposed 
by He and Torkelson [17] and is known as Radix-2? algorithm. This optimiza- 
tion simplifies the traditional Radix-2 FFT decomposition by considering two 
butterfly stages at once. When modifying some of the twiddle factors, all 
multiplications after the first stage can be omitted or rather transformed into 
a trivial multiplication by +j. A dopting this modification to the presented 
Radix-2 SDF architecture, half of the multipliers can be saved. Table 7.1 
compares different implementations. 

In the case of multiple parallel data streams, the utilization of the complex 
adders and multipliers can be further increased to 100% by using a modified 
M DC architecture with a proper scheduling of the different data streams [18]. 


Figure 7.11 Radix-2 FFT implementation based on a SDF pipeline. 


Table 7.1 Resource usage of different pipelined FFT implementations [17] 
No. of Multipliers No.ofAdders Memory Size 


Radix-2 MDC 2(log, N — 1) 4log, N 3N/2—1 
Radix-2 SDF 2(log, N — 1) 4log, N N-1 
Radix-4 SDF log, N — 1 8log, N N-1 


Radix-2? SDF log, N — 1 Alog, N N-1 
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In the case of MIMO systems this approach outperforms the Radix-2? SDF 
implementations which seem to be superior in single channel applications. 

Even though not optimal in terms of butterfly utilization, a Radix-2 based 
architecture provided by the X ilinx IP Core is used for the presented MIM O 
radar system [19]. The principal reason is the faster implementation and 
integration time. The efficiency in terms of resource usage can be improved 
in future work. 


Fixed-Point Noise 


In digital signal processing systems, all computations are carried out with 
discrete values. The majority of arithmetic units use fixed word lengths which 
always have a limited accuracy. Consequently some amount of quantization 
noise is added for each rounding operation. Often floating-point values are 
used, because they work very well in most environments, regardless of the 
input signal characteristics. However, if the dynamic range of the input signal 
is known to a certain extent, fixed-point arithmetic can considerably reduce 
the resource usage. M any FFT accelerators use integer operations and various 
models for the engendered quantization noise have been developed. 

The quantization noise due to truncation or rounding after a multiplication 
is often modeled as additive white noise source with a uniform distribution. 
Even though not accurate under all circumstances, this model is appropriate 
if the input signal has a sufficiently large bandwidth and amplitude [20]. It 
can thus be applied to a radar system, due to the wide bandwidth background 
noise, which is always visible. 

The quantization noise variance c? in the case of a uniform distribution 
can be derived for a simple truncation [21]. The least significant bit (LSB) 
after the truncation is denoted by q = 2-5, where b is the resulting integer 
word length and & the number of truncated bits: 


2 d 2k 
= (1-27 
During the computation of the FFT, the variables grow with each butterfly 
stage, resulting from the addition inside the butterflies. The complex multipli- 
cation does not scale up the intermediate values, because they perform just a 
rotation in the complex plane and the twiddle factors are all normalized. T hus, 
the resulting word length of the FFT depends on the input data and grows by 
1 bit with each stage. In order to maintain a certain word length, the values 
can be scaled after each stage at the cost of additional quantization noise. 
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A complete scaling of the input signal is disadvantageous and engenders an 
even higher level of quantization noise [22]. 

Furthermore, a quantization error is introduced after the multiplication, 
because the resulting word length is cut down by half and also the twiddle 
factors are represented with limited accuracy. However, it turns out that the 
coefficient errors are less severe than the round-off errors if the same word 
length is used for both the coefficients and signals [22]. 

Thefollowing analysisis based on [22], and only the most severe round-off 
errors are considered. The used noise model applies to a Radix-2 decimation- 
in-frequency butterfly, which is used by the presented system. Furthermore, 
the signals are not scaled directly after the addition, but only after the 
multiplication. Therefore, only one noise source is present for each butterfly 
output. For the sake of simplicity, the error variance for both outputs is 
considered equal, even though only one output is the result of a multiplication. 
This approximation acts as an upper bound because the real output variance 
after the addition and the truncation will be slightly lower. 

The variance of the quantization error c? after the multiplication is derived 
by decomposing the complex operation into four real multiplications, each 
truncated individually. In this case, the uniform noise model is applied and 
the number of truncated bits & is assumed to be sufficiently large: 

2 q q 
067443 — 3 
The total output variance is then calculated by adding all error variances 
contributing to the respective output. When observing the butterfly graph, 
atree-like structure leads to each output, incorporating NV — 1 butterfly nodes. 
However, if the signal is scaled after each stage, the accumulated noise 
decreases just as well. In this case, the total noise variance o%, equals to: 


2 2 2 
2 2 Ce Ce N Ce 
= 1 2 4 eed = 
ON —0. 1 + 16 * 2 (N/2)2 
1 1 1 
oo (1454 qe | LEE 


Remarkably, the total noise variance is independent of the length of the FFT. 
However, when examining the signal-to-noise ratio (SNR) at the output, it 
turns out that the SNR is decreasing for longer FFT lengths, because the 
output is a scaled version of the FFT. Considering a random input signal, 
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Figure7.12 Effects of different word lengths on the amount of quantization noise. 


with all values i.i.d. and a variance c2, then the variance for each output of 
the FFT is scaled by A»: 


Composing the signal-to-noise ratio at the output leads to the expected result: 


2 
SNR _ Os, fft un c2 eN 30? 
m 2 m 2 m 2 
ON 202N | 2Nq 


Consequently, if the FFT length N is doubled, the word length has to 
be increased by half a bit also in order to maintain a constant signal-to- 
quantization-noise ratio (SQNR). To illustrate the influence of the word length, 
an exemplary radar measurement, processed with a scaled fixed-point FFT is 
shown in Figure 7.12. 

Different word lengths have been used in order to illustrate the effect of 
the introduced quantization noise. The FFT is implemented in a Radix-2 DIF 
decomposition. The values are rounded and scaled after each stage. T he black 
curve has been processed with double precision floating-point and acts as 
reference. 

It can be observed that the fixed point versions lie all abovethe reference. 
The reason is that the quantization noise power is added to the signal 
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and the amount of quantization noise should be the lowest for the floating 
point version. Furthermore, it can be observed that the noise floor increases 
significantly in regions with low signal power. The difference for 1bit word 
length is about 6dB , which correlates with the derived noise model in the case 
of atruncation or rounding operation. In regions with more signal power, for 
instance around 5m target range, the quantization noise effect is less severe, 
due to the higher SQNR. 

For a radar system application, it should be ensured that the added 
quantization noise does not deteriorate the total signal-to-noise ratio. The SNR 
is akey parameter for reliable target detection. N oise components arising from 
fixed-point computations should be clearly below the system noise floor in any 
case. Itis important to consider the processing gain when designing an optimal 
word length, because the noise level drops for larger FFT lengths. Thus, the 
maximum possible FFT length can be considered as worst-case scenario when 
designing the word length of the FFT. 


7.3.3 Rank-Only OS-CFAR Accelerator 


The CFAR processing step requires the use of a local window for threshold 
calculation. For a streaming application, asliding window exploits the locality 
of the data and can be used easily without excessive memory transfers. It is 
implemented with the help of a shift register. Current FPGA devices offer 
several different building blocks for this purpose, namely Block RA M s, lookup 
tables (L UTs) and ordinary flip-flops. For the presented OS-CFAR architecture 
all signal values inside the window need to be accessed at once. Hence, a data 
tap is required at each position of the shift register and solely flip-flops can be 
used for its realization. 

As described in subsection 7.2.3, the rank-only detection step depends on 
N comparisons, a binary sum and acomparison for the decision. Each register 
of the sliding window is routed to a dedicated comparator, whose second input 
is fed by the CUT with a threshold value applied. The comparison result is 
routed to a binary adder with N inputs. Several LUTs are cascaded for this 
step, which can impose an upper limit to the clock frequency. In order to 
maximize performance, it is implemented in two steps, i.e. the lower and the 
upper half of the window is summed up separately before the final rank is 
computed. 

The described architecture has been implemented on a Virtex-7 FPGA 
and the engendered resource usage has been analyzed. For window sizes 
up to 128, an operating frequency of 250 MHz could be achieved by this 
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Figure 7.13 Architecture of the rank-only OS-CFAR accelerator. 


implementation. The LUT usage depending on the number of channels is 
depicted in Figure 7.14. 

Asexpected the CFA R -processing part (greenish blue color in Figure 7.14) 
is practically independent from the number of channels, because the N CI step 
is performed in advance. The NCI step by itself scales approximately with 
log N, which is a result of the used tree structure. For a number of channels 
above 32, the raw data buffer which compensates the pipeline delay consumes 
more LU Ts than the CFAR processing part. It grows linearly with the number 
of channels and is thus the dominating part for large channel numbers. The 


m Other mCFAR $NCI = Buffer m Other mCFAR @ NCI eo Buffer 
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Figure 7.14 Resource usage against number of channels for a constant window size (128 
cells). 


7.4 Conclusion 153 


——1 channel -*-4 channels —&-16 channels —— 1 channel -+-4 channels —4-16 channels 


9000 + 12000 + 
8000 + 
É 7000 | 
= 6000 + 
= 
© 5000 + 
a 
S 4000 + 
E 3000 | 
2 2000 4 
1000 4 
0 + 


10000 -— 
8000 + 


Number of Flip-Flops 
8 
8 


300 
CFAR window size CFAR window size 


Figure7.15 Resource usage against window size for different number of channels. 


usage of a dedicated Block RAM can be considered if the number of LUTs is 
scarce. 

The scaling behavior in relation to the window size turns out to be nearly 
linear (cf. Figure 7.15). Itis clearly dominated by the N comparators as well 
as the data buffer equalizing the pipeline delay. The number of channels has a 
much lower effect on LUT resource usage as the window size. For instance, 
the resource usage is within the same order of magnitude when comparing one 
and 32 channels. T he architecture can be considered as very efficientfor large 
channel numbers and is thus suitable for MIM O systems. It can be concluded 
that the usage of NCI before the actual CFAR processing is beneficial in two 
ways. It improves detection performance and reduces resource requirements 
at the same time. 


7.4 Conclusion 


A data processing architecture for future automotive M IM O radar systems has 
been presented in this chapter. B esidethealgorithmic background information, 
a focus has been set on the target detection with the help of CFAR processing. 
Attention has been paid to real-time requirements as well as resource usage. 
The step between the target detection and the subsequent angular processing 
could be identified as a good datainterface between different processing units, 
each optimized for different requirements on control flow complexity and data 
throughput. 

Furthermore, a FPGA based implementation of the raw data preprocessing 
chain has been presented and investigated. As crucial points in the design 
procedure, several parameters could be identified. Especially, the maximum 
length of the FFTs and the expected dynamic range of the signals determine 
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basically the resource usage in terms of logic elements and memory size. These 
parameters have a strong dependency on the used modulation waveform, 
which is why the design of the signal processing architecture has to be 
integrated into the overall radar system design process. With the help of 
model-based design space exploration methods, the estimation of resource 
requirements is feasible, even in an early development stage. The derivation 
of appropriate models from the realized hardware implementation will be part 
of future work. 

Theused design methodology which evolved from the DESERVE project 
turned out to be very efficient in terms of performance and development time. 
The usage of heterogeneous platforms, even in an early prototype system, 
made it possible to handle the tremendous amount of datain real-time. Thanks 
to the integration with established tools like ADTF and Matlab, the system is 
ready to be integrated into a test vehicle with a multiplicity of sensors devices. 
Finally, the early availability of such high resolution automotive radar sensors 
can be an important step on the way towards automated driving. 
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8.1 Introduction 


Many car accidents involving vulnerable road users (e.g., pedestrians or 
cyclists) occur on rural roads after dark, when the driver's visibility is 
restricted. T hus, the main objective of an augmented night vision is to assist 
the driver, when driving on side roads (e.g., highways, country roads, or rural 
roads) with poor or restricted visibility by alerting the driver to potential 
obstacles ahead. 

One possible augmentation of driver vision is to highlight potential 
obstacles, hazards or vulnerable road users in the live video of the road ahead. 
A classification of image content is mandatory for this application. As the 
augmentation enables the driver to grasp the situation quickly, the distance to 
the detected object has to be calculated by stereo vision to ensure accuracy 
and speed of assessment. 

As the range of distance resolution increases with the baseline of a 
Stereo system, a wide baseline stereo system is necessary to facilitate the 
augmentation of objects in the desired range. Such a wide-baseline stereo 
system is sometimes not practicable when rigidly coupled, therefore cameras 
are mounted individually, e.g., to the windshield. Physically separated cameras 
increase the camera baseline, however a moving car causes multiple vibration 
sources [1] which misalign the images of the separated cameras. T herefore, 
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online camera calibration is indispensable for further image processing. This 
online camera calibration covers the reconstruction of extrinsic camera param- 
eters, which rely on a sparse pixel correspondence list from the two camera 
images. The general overview of the algorithmic flow is depicted in Figure 8.1. 
This chapter will focus on the search for sparse pixel correspondences and 
extraction of camera calibration parameters. 

The remaining chapter is set up as follows. Section 8.1 gives an introduc- 
tion to the self-calibration of wide baseline stereo cameras. A fter a review of 
the considered algorithms in Section 8.2, Section 8.3 details the class of image 
feature detectors and extractors. Section 8.4 highlights the matching of image 
features. An in-depth description of the bundle adjustment for the camera 
calibration is given in Section 8.5. In Section 8.6, selected application-specific 
aspects regarding the algorithmic parameterization are presented. Section 8.7 
focuses on algorithmic-specific and hardware-specific implementation details 
and gives an overview of existing implementations for the extraction of image 
features. 


8.1.1 Extraction of Image Features 


Image feature extraction consists of two steps: the detection of image features 
and the generation of the descriptor for those feature points, which results in 
a unique signature as a representation for the detected feature points. 
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Figure 8.1 Algorithmic overview. Input of the processing chain is a stereo image pair, in 
which sparse pixel correspondences are extracted for online camera calibration. After the 
calibration, rectification is performed as a preprocessing step for disparity estimation. 
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The image feature detection generates a list of distinctive invariant points 
in images for the feature localization. Especially for camera calibration, a high 
accuracy of localization is required [2] in order to ensurea correct functionality 
of following algorithmic steps, e.g., the rectification of stereo image pairs. Due 
to the similarity between the views of the scene, a rotation invariance or scale 
invariance of the feature descriptors supports stability of the matches. This is 
however, not mandatory, because characteristic points in image pairs of the 
used stereo camera configuration rarely change their rotation or scale abruptly 
from left to right stereo image. 

In recent years, three different principles for feature detection have 
proven employable. Corner or edge detectors extract characteristic corners 
or edges in an image, which are defined by large gradient changes of image 
intensities. So called blob detectors determine pixel positions, for which 
a circular local neighborhood is approximately constant or similar for a 
defined image property [3]. Furthermore, affine invariant detectors have been 
adapted to be invariant to affine transformations, which are approximations 
to perspective distortions in order to achieve invariance to large changes in 
viewpoint [4]. The detected features of the exemplary SIF T-feature detector 
are shown in Figure 8.2. 
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Figure8.2 Left (top) and right (bottom) image from a stereo camera system showing detected 
SIFT-image features. D etected feature points of the left/right image are displayed in red/green, 
matches are displayed in blue. Scale and rotation of the SIF T-features are illustrated by the 
circle properties. 
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The descriptor of an image feature characterizes the detected feature point. 
Ideally, a feature descriptor of a world point is unique when compared to 
other descriptors, but identical for the same world point in different views [5]. 
Two representations for descriptors have been established in recent years. So 
called histogram-based or distribution-based descriptors represent the local 
neighborhood of a feature point by histograms of local image properties like 
pixel intensities, color, texture, edges etc. [3]. F urthermore, binary descriptors 
represent a local pixel region by storing the binary result of predetermined 
pixel-level intensity comparisons [6]. In contrastto distribution-based descri p- 
tors, binary descriptors contain a more compact representation of the image 
patch around a feature. 

In general, extracted image features have to cope with various influences. 
Firstly, there are disruptive effects related to the image quality, e.g., image 
compression, image noise, image blur due to zoom or exposure. Secondly, 
there are influences resulting from the content of the stereo image pair, e.g., 
illumination, difficult viewpoint conditions or occlusions, background clutter 
and general content changes, perspective changes or changes in the view point 
of planar and non-planar geometry [6, 7]. Finally, application specific factors 
as scale and rotation of objects impact the algorithmic results dealing with 
image features. Thus, extracted image features have be invariant to as many 
disturbing influences of the named categories as possible. 

The large variety of image feature detectors and descriptors clearly show 
the manifold approaches to defining and describing characteristic points in 
images. A sS. Gauglitz mentioned beforein [5], “thereis no clear-cut definition 
as to what makes a point interesting. Detection of such points is only an 
intermediate step in any application”. There is no general answer for the 
question, which detector or descriptor is performing the best. Therefore, 
as ]. Shi and C. Tomasi postulated in 1994, “the right features are exactly 
those that make the tracker work best” [8]. Consequently, “any set of feature 
points is acceptable, but the result ought to be consistent, e.g., in images 
that show the same scene, the algorithm should detect the same points.” 
[5]. In other words, for each application, the best performing combination of 
image feature detector and extractor has to befound. Furthermore, application- 
specific conditions (here: high localization accuracy with low requirements 
to scale and rotation invariance) aggravate the possibilities of algorithmic 
combinations. 

A survey of existing image feature detectors and descriptors will be given 
in Section 8.2. A more detailed presentation of an exemplary feature detector 
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and descriptor called SIFT (Scale-Invariant Feature Transform) [9], which 
shows good results in this application, will be given in Section 8.3. 


8.1.2 Matching of Image Features 


Matching image features results in a list of pixel correspondences between 
the left and right input image of the stereo image pair. The main challenge 
is on the one hand to find as many corresponding pixels as possible while 
avoiding wrong pixel assignments on the other, even if there are several 
similar regions in both input images. The assignment of image features to 
pixel correspondences is based on feature descriptors, which are used to find 
the maximum similarity between the extracted image features. Depending 
on the representation of the features (histogram-based or binary descriptor), 
the similarity is computed by various vector norms for the distance of 
two matching candidates or the Hamming distance. Furthermore, different 
matching methods have a significant impact on the resulting correspondence 
lists [3]. 

In the case of global feature matching methods f : X — Y, two feature 
points z e X and y e Y are assigned by local similarity, which is deter- 
mined by the related descriptors d; and dy. For each descriptor in set Y, there 
is a corresponding descriptor in set X with a minimal error criterion. A fter 
the assignment of feature points, the correspondences are filtered by this error 
criterion in order to avoid false correspondences, e.g., feature points which 
are not detectable in both images because of occlusions in one image. Varying 
matching methods differ in the error criterion for the evaluation of feature 
similarity and the search algorithm during the matching step. 


8.1.3 Extrinsic Online Self-C alibration 


Common stereo algorithms for disparity estimation (e.g., [10]) rely on 
exact knowledge about the intrinsic (e.g., focal length) and extrinsic camera 
parameters (the transformation between two cameras). Calibration errors 
lead to erroneous reconstruction values. The camera parameters enable the 
rectification, which is the projection of the camera images to a common image 
plane and they form the basis for further processing. 

The intrinsic parameters may be assumed to be constant and identified 
using an offline calibration procedure (e.g., [11]). As the cameras are not 
rigidly coupled here, the extrinsic parameters vary due to vibrations in the 
car and are assumed to change rapidly from frame to frame. Thus, a one-time 
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offline calibration procedure does not suffice to meet the accuracy require- 
ments of stereo processing. Thus, an online calibration procedureis necessary. 
W hile driving the use of calibration targets with known geometry is difficult. 
Therefore, a self-calibration mechanism is needed. 

Theidea behind onlineself-calibration procedures is to estimate the camera 
parameters based on what is perceived in both cameras. So a preprocessing 
step to the calibration is a one-to-one identification of scene points visible in 
both camera images, e.g., a list of sparse pixel correspondences of the stereo 
camera images. 


8.2 Algorithmic Overview 


M any approaches have been proposed in recent years for the extraction and 
matching of image features and for the feature-based camera self-calibration. 
In the following section, selected aspects for each algorithmic step are 
reviewed separately. 


8.2.1 Survey of Image Features Extraction 


The process of extracting image features is split into two algorithmic parts, 
the detection of feature points and the generation of the feature descriptor. For 
both steps, a large number of algorithms have been published. In this section, 
typical examples of each algorithmic step are presented. 


8.2.1.1 Detection of features 

W hich properties of distinctive image points are mandatory for a satisfactory 
matching of image features depend on the finale application. There is no clear 
definition as to which extraction strategy is best as it only needs to provide 
sufficient algorithmic performance during retrieval in the same scene on image 
sequences from different viewpoints. Therefore, what is characteristic for 
highly distinctive points in images is an application-specific approach, which 
has led to four basic methods for extracting retrievable points in images. 


Edge detection 

Edges are stable features, which are detectable over a range of viewpoints 
and illumination changes [12]. An edge, e.g., the border of an object, is 
defined by discontinuities in pixel intensities in a single image dimension (see 
Figure 8.3(b)). Thus, the Canny detector [13] determines the gradient of 
the input image with the Sobel operator and by evaluating magnitude and 
orientation of the gradients, the edge's direction and its strength are extractable. 
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Figure 8.3 Detection of edges and corners by image gradients. The blue circle shows a 
possible feature point, surrounded by a local neighborhood. (a) Low image gradients in two 
spatial directions represent texture free image areas. (b) A high image gradient in one spatial 
direction indicates a possible edge, (c) in two spatial directions a possible corner. 


Gradient and direction are used in a non-maximum suppression in order to 
suppress equivocal edges in the local neighborhood of a possible edge. 

The drawback of this method is the equivocalness of the detected fea- 
ture points. As depicted in Figure 8.3(b), it is not distinct which detected 
points are corresponding on the edge while matching two detected feature 
points and therefore, it will lead to incorrect pixel correspondences. 


Corner detection 

Corners are defined as intersections of edges or as pixel continuities in two or 
more image directions (see Figure 8.3(c)). In addition to simple corners, line 
endings and cropped intensity changes are detected using this type of detector. 

One early corner detector is the Harris corner detector [14] (1988), which 
approximates the sum of squared differences of two image patches in order 
to detect a difference in image intensities. The approximation results in the 
second moment matrix, which represents the dominant directions of a local 
neighborhood in the gradient image. W ith this approach itis not only possible 
to detect corners, but edges as well. 

To avoid such costly filters, a detector has been presented that does not 
rely on discrete image derivatives, but on the number of intensity differences 
between pixels [5], which are located on a Bresenham circle (see Figure 8.4). 
Rosten [15] sped up this process by reducing the number of pixel tests with 
machine learning techniques to find the fastest sequence of pixel comparisons 
for rejecting a wrong corner candidate. 

The matching of detected corners in different images of the same scene 
provides correct pixel correspondences as long as the detected corners belong 
to objects of thesamesize.A corresponding corner is just detectablein different 
images, if the regions for describing the corners have similar dimensions 
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Figure 8.4 Intensity comparisons of pixel, which are located on a Bresenham Circle. The 
central pixel is determined as a corner if a certain number of continuous pixel intensities is 
brighter or darker than the central pixel. Thisis combined with an adoptable threshold to avoid 
instabilities. 


(see Figure 8.5, red circle), whichis dependent on the object size. To overcome 
this problem, repeated image scaling isa possibility or an object size dependent 
adjustment of the region for the descriptor generation. 


Blob detection 

A blob is a region of connected pixels, which share a common image property, 
e.g., pixel intensities, and therefore stand out from surrounding regions. By 
formulating image properties as a function of pixel positions, local maxima 
and minima of the function are determinable. 


Figure8.5 Detection of corners of different image scales. With strongly different object sizes 
in the image, a corresponding corner is not detectable (red circle), but by a repeated image 
scaling. 
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Figure 8.6 Blob detector. The detected blobs are displayed as red circles. The blob's size is 
displayed as the diameter of the circle. 


It has been shown, that the L aplacian of Gaussian (LoG) [16] has a strong 
response to dark and bright image regions, which are detectable as blobs. 
The response is highly dependent on the size of the filter kernel used (see 
Figure 8.6). 


Affine-invariant interest point detection 

Images features based on a blob detector hardly match for large scale or 
viewpoint changes [4], because circular image patches for blob feature 
extraction will lead to large distance measures for blob feature matching due 
to less covering of the circular regions (see Figure 8.7). By applying circular 
image patches, the used image information is too different to ensure stable 
pixel correspondences for large viewpoint changes. Therefore, M ikolajczyk 
[7] extends blob detectors to affine invariance by estimating the affine shape of 
alocal neighborhood. For affine transformations, the scale of an image region 
changes differently in each direction, which leads to differing local regions 
for the blob detection and therefore to differing localization or to mistaken 
detections. 


Figure 8.7 Blob detection based on circular image region for a scene with a large viewpoint 
change. The region on which the blob feature extraction is based only partially covers the 
corresponding region and thus, will lead to non-matching image features. 
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In order to deal with affine transformation, M ikolajczyk [7] replaces the 
blob detection scales, which are equal in all directions, by affine detection 
scales, which vary independently in orthogonal directions. H ereby, thecircular 
point neighborhood is replaced by an ellipse, which is determined by the 
second moment matrix. With the affine normalization, the ellipseis normalized 
to a circle again and a blob is detectable within the transformed image patch 
(see Figure 8.8). 

Sincethefour presented methods provide large differences in quantity and 
quality for detected interest points, a suitable algorithm has to be chosen with 
regards to the application. 

In 3D reconstruction, precise localization of interest points is one major 
aspect [4], therefore a sub-pixel accuracy for feature detection is mandatory. 
Self-occlusion occurs very frequently in real world scenes and typically 
many interest points are found near occlusion boundaries. Accurate posi- 
tioning of features is imperative. As has been shown in many publications, 
center-oriented detectors (e.g., LoG, DoG or CenSurE) [5], provide a higher 
and more stable repetition rate than corner or edge detectors. Furthermore, 
affine-invariant interest point detectors have been adapted to be robust 
to large changes in viewpoint [4], which is of minor importance even 
for reliable image feature matching for a wider baseline stereo camera 
System. 

Taking into account the algorithmic robustness of the presented methods 
for the detection of image features and the high requirements of ADAS 
(Advanced Driver Assistance Systems), a blob detector is used for the 
detection of features henceforth. In subsection 8.3.1 the SIFT-detector [9] 
will be presented in detail as an exemplary blob detector. 


Figure 8.8 Affine-Invariant Interest Point Detection. The circular point neighborhood is 
replaced with an ellipse in order to achieve independent orthogonal varying detection scales 
for interest point detection. Before applying a detection algorithm, the local neighborhood is 
affine normalized, which results in a circular neighborhood and a transformed image patch 
(from [7]). 
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8.2.1.2 Description of features 

After the detection of interesting points, the descriptor as a unique represen- 
tation of an image feature has to be generated. In addition to histogram-based 
descriptors, which are memory greedy, binary descriptors have been estab- 
lished as a more compact representation for image features. In addition, 
compared to histogram-based descriptors, the distance of two binary descrip- 
tors, which is required for feature matching, is faster to match. There are 
other techniques to describe image features such as image patch correlation 
or generalized moment invariants [3], however the focus of this section is 
limited to the two mentioned descriptor types, due to their suitability for the 
self-calibration of wide baseline stereo camera systems. 


Histogram-based descriptors 

A simple way to describe a detected blob in a histogram-based manner is the 
distribution of pixel intensities of the local blob region. Due to the fact that 
this technique is prone to illumination changes, more complex approaches 
have been presented (see [3]), e.9., the distribution of gradient locations and 
orientations in the local blob area instead of the distribution of pixel intensity 
itself. In the case of the SIFT-descriptor, the coordinates of the descriptor 
and the gradient orientations are rotated relative to the feature orientation and 
afterwards, a histogram is generated based on orientation and magnitude of 
the image gradient [9]. Furthermore, the quantization granularity of gradient 
locations and orientations leads to a robust descriptor, which is stable to 
small geometric distortions and small errors in the blob region. Besides 
multiple techniques for histogram generation, different sampling grids have 
been introduced (see Figure 8.9). The resulting descriptor is a multidimen- 
sional vector with the histogram's bins as components. In the case of SIFT, 


(a) (b) (c) 


Figure 8.9 Sampling grids for generating different descriptors: (a) SIFT [9], (b) Shape 
Context [18], (c) DAISY [19]. 
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each vector consists of 128 values of floating point precision. The size of a 
feature vector is highly dependent on the algorithmic parameters, but never- 
theless histogram-based descriptors usually have high memory requirements. 
Therefore, techniques for a more compact descriptor representation have been 
developed, e.g., principal component analysis for PCA-SIFT [17]. 


Binary descriptors 
Due to the fact that histogram-based descriptors provide a large comple- 
xity [3] and high memory requirements [6], a sped up generation and a 
more compact representation for feature descriptors is desirable. Therefore, 
binary descriptors are characterized by sampling patterns and predefined 
sampling pairs. Sampling patterns define a set of potential sampling locations 
(Figure 8.10, blue circles), whose image information are optionally smoothed 
with spatial - dependent filter kernels (e.g., Gaussian smoothing) (Figure 8.10, 
red circles). A fixed combination of the filtered intensities is selected in 
advance as descriptor specific sampling pairs (see Figure 8.11, two variations 
of sampling pairs for the FREAK descriptor). 

For each sampling pair, a binary test 7 is performed, e.g., 
(BRIEF [20]): 


1 if I(p,x) <I(p,y) 


0 otherwise 


T(p £, y) := | 


(a) (b) 
Figure 8.10 Sampling pattern. (a) BRISK descriptor, (b) FREAK descriptor [21]. Sampling 
patterns define a set of sampling locations (blue circles), of whose image information is 
smoothed with spatial-dependent filter kernels (red circles). Out of the sampling pattern the 
sampling pairs for the binary tests for the descriptor generation are selected, 
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(a) (b) 
Figure 8.11 Two variations of sampling pairs of the FREAK descriptor [21]. A fixed 
combination of sampling locations is selected as descriptor specific sampling pairs, with which 
the binary tests for the descriptor generation is performed. 


where I(p,x) is the pixel intensity in a smoothed image patch p around 
an image position x = (u,v)”. On a set of na precomputed pixel pairs, 
such binary tests are performed. The resulting descriptor of dimension na 


ensues to l 
$- Ap) 
1<i<na 


Typically, a binary descriptor has a maximal length of 512 Bit. 


8.2.1.3 Characteristics of features 

Invariances to rotation and scale increase the detection rate of features in 
similar views of a scene and ensure the distinctiveness of the detected feature 
points. B y assigning a region based main orientation, a feature is rotated by this 
orientation in order to match it with a corresponding feature from a different 
orientation. Furthermore, objects often vary in size in different images, which 
lead to variant image regions for the description of the same feature. To unify 
the descriptor generation, Lindeberg's [16] scale-space theory is applied. 


Rotation invariance of a feature descriptor is achieved by rotating the 
sampling grid or sampling pattern for the pixel area which is used for the 
descriptor generation by the main orientation before the descriptor is extracted 
(see Figure 8.12) or by rotating the descriptor itself. To determine the main 
orientation, different approaches are available. Rublee et al. [22] use intensity 
centroids to determine the main orientation of a patch, whereas L eutenegger 
et al. [23] usethe gradient of predefined sampling pairs to rotate the sampling 
pattern. Further techniques are available in the literature (e.g., [9, 21, 24]). 
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Figure 8.12 Rotation invariance is achieved by rotating the sampling grid by the main 
orientation before extracting the descriptor. 


Scale invariance of image features is attained by applying Lindeberg's 
[16] scale-space theory for image processing to the input images while 
detecting image features. The input image is subsampled multiple times to 
generate different scales of the input image and the detection step is repeated. 
If the same feature candidates are detected on multiple scales, the candidate on 
the scale with the highest information content is selected in order to achieve 
scale-invariance (see Figures 8.13 and 8.14). Lowe (SIFT, [9]) approximates 


Figure 8.13 Scale-space. An input image is down sampled to achieve multiple scales of the 
image. On each scale, feature candidates are found, whereas repeated candidates are removed. 
Thescale with the highest information content for the feature candidate is selected as thefeature 
scale (from [16]). 
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Figure 8.14 M ulti-scale approach for blob detection. The same blob with differing scales in 
two images and the related response (normalized Laplacian of Gaussian) over scales is shown. 
The scale with the highest information content is chosen as a blob (from [7]). 


Lindeberg's LoG scale-space with different Gaussian smoothed images and 
therefore, the complexity is reduced significantly. 

A further approach for scale invariance is the detection and later suppres- 
sion of feature candidates which are detected on multiple scales, but have the 
same image position. Those repeated nominations are compensated by a non- 
maximum suppression [6], which evaluates a predefined cornerness score and 
selects the most unique feature point. 

Image feature detection and description are not completely independent. 
By choosing a certain feature detector, a specific local neighborhood is 
used to detect interesting points. This specific local neighborhood has to be 
also employed to extract the feature descriptor in order to ensure a reliable 
description of the image patch. A Ithough it seems to bea promising approach, it 
is not advisableto combine any detector with any descriptor [4]. The following 
overview (see Tables 8.1 and 8.2) of selected state-of-the-art feature extractors 
and feature descriptors with references is not intended to be exhaustive, 
but gives an impression of how many different detectors and extractors 
are available and therefore combinable. For an appropriate performance, 
each algorithm requires an application-specific parameterization, which may 
depend on the previous and following processing step. Thus, this large number 
of degrees of freedoms results in an algorithmic variety, which is hardly 
ascertainable. 
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Table 8.1 Overview of feature detectors 
Feature Detector Year Comment 


SIFT [9] 1999  Scale-Invariant Feature Transform 

Scale-space based, invariant to scale and rotation 
SURF [25] 2008 Speeded Up Robust Features 

Scale-space based, invariant to scale and rotation 
KAZE [24] 2012 Non-linear scale-space based 


Invariant to scale and rotation 
A-KAZE [26] 2013 Accelerated-K A ZE 
Improved KA ZE feature detector 


BRISK [23] 2011 Binary Robust Invariant Scalable K eypoints 
Scale-space based, invariant to scale and rotation 

FAST [15] 2006 Features from A ccelerated Segment Test 
Segment based corner detector 

ORB [22] 2011 Oriented FAST and Rotated BRIEF 


Advanced from FAST and BRIEF (see descriptors) 


Table8.2 Overview of feature descriptors 
Feature Descriptor Year Comment 


SIFT [9] 1999  Scale-Invariant Feature Transform 
Histogram-based descriptor 

SURF [25] 2008  Speeded Up Robust Features 
7 Histogram-based descriptor 

KAZE [24] 2012 Non-linear scale-space based 
Histogram-based descriptor 

A-KAZE [26] 2013  Accelerated-K AZE 
Binary descriptor 

BRISK [23] 2011 Binary Robust Invariant Scalable K eypoints 
Binary descriptor 

BRIEF [20] 2012 Binary Robust Independent Elementary Features 
Binary descriptor 

ORB [22] 2011 Oriented FAST and Rotated BRIEF 
A dvanced from FAST and BRIEF (see detectors) 

DAISY [19] 2010 Dense Descriptor for Wide Baseline Stereo M atching 
Histogram-based descriptor 

FREAK [21] 2012  FastRetina K eypoint 


Binary descriptor 


8.2.2 Feature Matching 


The final step in finding sparse pixel correspondences is the assignment of the 
extracted image features in different image set ups, e.g., in time sequentially 
images for sparse optical flow, in stereo image pairs for feature-based sparse 
disparity estimation or in image patches for object detection. 
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As in the case of the previous algorithmic steps, many approaches for 
descriptor matching have been presented in recent years [3]. In order to 
determine the similarity of two image features, multiple correspondence 
measures are available. In addition, various matching methods lead to sig- 
nificant differences in matching results, which influences the resulting pixel 
correspondence lists and finally, some matching methods require a list search 
algorithm, for which again different approaches are available. Each aspect 
will be briefly reviewed in the following subsection. 


Correspondence measures for image features 

For histogram-based descriptors d € RI, which are real-valued vectors of 
dimension / € N, multiple vector norms are applicable on matching difference 
vectors as a similarity measure. The sum norm is defined as the accumulation 
of the component wise sum of absolute differences: 


Ili mE dyl = ees dai 


In order to weight large vector difference more than small differences, the 
Euclidean norm is useable. The norm penalizes large vector differences more 
than small vector differences by accumulating the component wise sum of 
squared differences: 


l 
> > 
| de — d yllo x i > [d.i = dl 
i=1 


Since only relative correspondence measures are used for feature matching, 
the square root is skippable to avoid costly computations. 

A further method for evaluating the distance of two vectors is the 
normalized cross correlation: 


XS ides dy 


SUN UNT 


The correlation yields good results for the matching of image features, but 
leads to high computational complexity [3] and is therefore rarely used for 
matching of image features in the field of advanced driver assistance systems. 

For binary descriptors, which consist of a bit string of length n, that 
represent the result of pixel wise test, the correspondence measure is the 


distance = MAL rex 
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Hamming distance, which is the accumulation of the bit wise XOR of 
the bit strings: 


ham diu. = 5 (Gag e dyi) 
i=1 

Due to the correspondence measure's simplicity, typically the distance com- 
putation of two binary descriptors is noticeably faster than the distance 
computation of two histogram-based descriptors. Contrary, not every binary 
descriptor has a comparable quality level as histogram-based descriptors 
for certain applications. By selecting a specific descriptor type, the implicit 
trade-off between execution time and descriptor quality has to be taken into 
account. 


Matching methods for image features 

The quality of resulting pixel correspondences highly depends on the utilized 
matching method. Three different methods have been established in the field 
of feature matching for advanced driver assistance systems (from [3]), which 
show different behavior in the matching inlier/outlier ratio: 


1. Threshold-Based M atching (T B) 
Two features match, if the distance between the descriptors is below a 
predetermined threshold. A feature may have several matches and several 
of them may be correct. 
2. Nearest-Neighbor-Based Matching (NNB) 
Two features match, if the descriptor d, is the nearest neighbor to d, 
and if the distance between the descriptors is below a threshold. A feature 
only has one match 
3. Nearest-Neighbor Distance Ratio Matching (NNDR) 
Two features match, if the descriptor d, is the nearest neighbor to d, 
and if a ratio e between the first and the second nearest neighbor is below 
a threshold: Sb x 
_ dz doll, 
==> => 
| dz — dell, 
where p indicates the type of norm. This ratio avoids ambiguous matches 


in case there are potential matches with asimilar distance. A gain, a feature 
has only one match. 


The matching quality for both nearest-neighbor approaches are higher than for 
theTB matching [3], because the probability of a correct match for the nearest 
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neighbor matchings is higher than the TB matching, although the distance 
between similar descriptors possibly varies significantly. The nearest neighbor 
matchings select only the best match below the threshold and rejects all 
others and thus, there are few false matches. In addition, the NNDR matching 
penalizes descriptors which have many similar matches, e.g., the distance 
to the nearest neighbor is comparable to the distance of the second nearest 
neighbor. This leads to further improvement in precision. The drawback of 
the nearest neighbor matchings is the complexity when matching two large 
pools of image features and the computative costly division for the NNDR 
matching. 


List search approaches for matching of image features 

The matching of two large pools of image features to find pixel correspon- 
dences in differentimages results in acostly process, because a correspondence 
measure and the first two nearest neighbors have to be evaluated for each 
possible feature combination. B y restricting the pool of feature candidates for 
the matching process, a significant reduction of problem size is achievable. A 
possible restriction bases on feature properties, e.g., localization in the image, 
orientation or scale. Constraining the feature candidates means, that the pool 
of all image features has to be scanned for valid candidates, which is a list 
search problem. 


1. Sorted Linear Candidate Search 
A prior sort of the pool regarding the restriction parameter enables a 
reduction in search time. By using the iterative successively approxi- 
mation, the list index of the first element which fulfills the restriction is 
searched. The last candidate of the reduced list is searched with a linear 
search. 
After each iteration, the step size is halved and the search index is incre- 
mented or decremented depending on whether the restriction criterion is 
fullfilled. The initial step size is half the initial pool size. 

2. KD-Tree C andidate Search 
A KD-tree [27] based search is a search tree with two edges per vertex 
and which divides the remaining set of feature candidates into two sets 
of the same size. By stepping through the K D-tree, the index of the 
first valid feature candidate is found efficiently. The disadvantage of 
this search method is the time consuming a priori construction of the 
K D-tree, which is not effective for small feature pools. In addition, if the 
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restriction search space has a low dimension, other search methods will 
perform faster. 


8.2.3 Survey of Feature-based Self-Calibration 


Extrinsic camera self-calibration is about recovering the extrinsic camera 
parameters using scene point correspondences only. Camera self-calibration is 
still a wide field of active research with different approaches. Early approaches 
are subdivided into aiming 3D reconstruction or not. The latter covers those 
algorithms where no information about the scene in front of the cameras is 
recovered during optimization. 

One of the first approaches has been proposed by L onguet-Higgins [28]. 
The author introduced a linear method to recover the essential matrix, which 
is decomposable into the extrinsic parameters. Due to the required number of 
image point correspondences, it was introduced as the 8-point-algorithm. 

Several following publications proposed optimizations regarding decom- 
position [29], plausibility [30, 31], and outlier handling for the corresponding 
image points [32]. As the linear approaches often lack the required accu- 
racy, they are often followed by a non-linear refinement in a stratified 
process. 

On the other hand, there are algorithms where camera parameters and 3D 
points of the scene are recovered simultaneously. One of those is bundle- 
adjustment [33]. Here a good initialization is required as Gauss-Newton 
optimization is involved. Thus, bundle adjustment is often chosen for the 
non-linear refinement as mentioned before. 

Regarding online calibration procedures, they are classifiable as recursive 
or non-recursive. Recursive, or continuous self-calibration, means thattempo- 
ral constraints are also optimized. Thus, image measurements in earlier time 
steps influence the current calibration result. Dang et al. proposed a parameter 
tracking system involving epipolar constraints and bundle adjustment [34]. In 
contrast to non-recursive self-calibration, there are no temporal constraints. 
Those are applied, in cases of a continuous decalibration or for active systems. 
Bjorkmann and Eklundh [35] introduced a real-time update of a restricted 
space of the extrinsic parameters. Pettersson and Petersson [36] extended 
a robust essential matrix estimation with a fast and robust FPGA -feature 
extraction. Parameter estimation for every new frame, beginning with rectified 
images, optimizing the extrinsic rotation and using a Kalman-Filter to limit 
overfitting was introduced by Hansen et al. [37]. 
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8.3 Extraction of Image Features 


Due to its stability and robustness, in respect of the requirements in advanced 
driver assistance systems, the Scale-Invariant Feature Transform (SIFT) by 
Lowe [9] is selected for this application as a state-of-the-art image feature 
descriptor and extractor in order to find sparse pixel correspondences in image 
pairs of a stereo camera system. 


8.3.1 Detection of SIFT-Feature Points 


Lowe'sSIFT (Scale-Invariant Feature Transform, [9]) is a blob detector, which 
utilizes Lindeberg's scale-space approach [16] to achieve scale invariance. 
Blobs are detected by finding local maxima in the approximation of the 
Laplacian scale-space. T he approximation of the L aplace operator is realized 
by the difference of two low pass filtered images, where both Gaussian ker- 
nels consist of different variances. The resulting scale-space approximation, 
the Difference of Gaussians (DoG), is constructed of several octaves with 
different image scales (see Figure 8.15). Every octave is subdivided into 
multiple intervals, which indicate the increasing variance of the Gaussian 
kernels. The initial interval of each octave arises by subsampling a specific 
interval of the previous octave. The DoG-pyramid, which represents the 
edges on multiples scales and different granularities, is browsed for local 
maxima in three dimensions (image position and intervals). A fter the detec- 
tion of feature candidates in the discrete scale-space, their localization is 
refined by a Taylor series in order to position the candidates with subpixel 
accuracy and to approximate the extrema in the continuous scale-space. 
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Figure 8.15 Image pyramid. The scale-space is constructed by different octaves, which 
consists of multiple intervals. Each interval indicates a specific variant of the used Gaussian 
kernel. In order to approximate the Laplace scale-space, the Difference of Gaussian is 
determined. 
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Candidates with a low contrast behavior and too edge like candidates are 
discarded. 


8.3.2 Description of SIFT-Image Features 


The SIFT-descriptor is a histogram-based descriptor and provides rotation 
invariance. Before histogram generation, the main orientation of each image 
feature is determined in order to align the local image region. To ascertain the 
main orientation for an image feature, a histogram of local image gradients is 
generated. T he contribution of alocal gradient to its corresponding orientation 
bin is defined by its magnitude and its distance to the feature point. A fter a 
smoothing step, the maximal histogram bin represents the main orientation of 
a feature point. 

In addition to a reproducible detection of characteristic image points, a 
distinctive and robust description of the local neighborhood of the detected 
points is indispensable. For the description of image features, the gradient 
magnitude and orientation of the DoG-pyramid is used. A squared pixel 
area around the detected feature point is rotated by the feature orientation 
(see Figure 8.12) and subdivided into a grid (see Figure 8.16). For each 
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Figure 8.16 Generation of feature descriptor. The local neighborhood is subdivided into 
independent subregions, which are combined into individual histograms. After a weighting 
and smoothing, the feature descriptor is generated by concatenating the single histograms to 
as a resulting feature vector. 
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Figure8.17 Extracted SIFT-features with exemplary geometry-based restriction of matching 
candidates. By restricting possible matching candidates geometrically, the problem size is 
significantly reduced. 


grid element, an independent histogram of gradients is generated using 
orientation and magnitudes. T he different histograms are weighted, smoothed 
and combined in a vector, which represents the final feature descriptor. 
The standard parameters of SIFT, which are suggested by Lowe [9], lead 
to 128 dimensions with floating point precision for the feature vector. 

An exemplary SIFT-feature extraction of a rectified automotive scene is 
shown in Figure 8.17. The features of the left/right stereo camera are depicted 
in red/green. The scale of the features is illustrated as the circle's diameter, 
the orientation of the features with the additional radius line. 


8.4 Matching of Image Features 


The application of feature matching for advanced driver assistance systems 
favors correct pixel correspondences instead of a certain set of instablefeature 
matches. Therefore, the matching of image features follows a straight forward 
approach with a significantly reduced problem size through matching of 
selected candidates. In this context, it is of minor interest which feature 
detector and extractor are used for the generation of image features. 
Duetothefact, that SIFT isahistogram-based descriptor, a vector norm has 
to be evaluated as correspondence metric. A trade-off between computational 
complexity and conclusive results is the sum norm. The matching with sum 
norm results in marginally lower matching quality compared to matching 
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with the Euclidean norm, but with a localization-based restriction of matching 
candidates, the matching results yield sufficient accuracy. 

By constraining the pool of possible matching candidates, the problem size 
of feature matching is reduced significantly. The initial brute force matching 
requires a computation of the correspondence measure between each features 
of the left image and every feature of the right image. B y taking into account 
the geometric set up of the stereo camera system, the search space is reduced 
to a fraction of the initial problem size, which results in a noticeable speed- 
up of matching and less wrong pixel correspondences at the same time (see 
Figure 8.17). 

An exemplary result of the primarily brute force feature matching and 
for the enhanced matching process using the mentioned algorithmic setup is 
shown in Figure 8.18. Both stereo input images are overlaid and the image 
related features are displayed in red/green for the left/right stereo image. 
The significant increase of matching quality is expressed by the reduction 
of detected false pixel correspondences (blue connections) in relation to the 
correct pixel assignments (yellow connections). For the depicted results of 
feature matching, the sum norm is applied as correspondence measure and a 
localization-based restriction for choosing matching candidates is used. 


Figure 8.18 Exemplary results of feature matching. The left and right stereo images are 
overlaid; features of the left/rightimage are displayed in red/green. Correct matches are depicted 
in yellow; false matches are shown in blue. The upper image shows the results of the initial 
brute force matching, whereas the lower image shows the results of the enhanced matching 
process. 
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8.5 Extrinsic Online Self-Calibration 


Hartley and Zisserman present the fundamentals of extrinsic online self- 
calibration in their book [38] about multiple view geometry. The extrinsic 
parameters of a stereo system are described by the rotation Rx € SO(3) 
and the translation vector tx € IR?. Given the extrinsic parameters the 
transformation of a point X, e R in the left camera coordinate system into 
the right camera coordinate system ¡s described as 


X= Rx(Xi — tx). 


Normally, extrinsic stereo camera calibration comes down to recovering Rx 
and tx. In the following, tx is assumed constant and only Rx is recovered. 
During rectification Rx is broken down into 


Rx = R;!R 


in order to determine the rotation of theleft and right camera coordinate system 
to the common image plane respectively. 

As decalibration is assumed to vary within a small range of only a few 
degrees, the recalibration is based on pre-rectified image point correspon- 
dences. The images may be pre-rectified using the camera parameters from 
the initial offline or a previous calibration run. 7 7 

Given N as the corresponding pre-rectified image points P; and Q; for 
i = 1,...,.N and assuming pinhole camera matrices K for simplicity, the 
image points are related to their unit directional image vectors 

RSK P; 
CEK Qe 
These vectors are related by the common epipolar constraint 
0=Q,KRK7!P; 
whereas R denotes the rotation compensating the decalibration. 

Since the decalibration is assumed to be small, optimization close to the 
identity matrix has to be avoided due to overfitting. Thus, the image vectors 
are re-rotated in the original camera coordinate systems via 


pi = RP; ai = Rri- 
Projecting them onto their respective image planes yields 
P; = Kp; Qi = Ky. 
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Given the measured image vector p;, the depth d; of the scene point X; 
and the decalibration R, the corresponding image point Q; may also be 
modelled as i : 

Q';(R, di) = KRRx((pid;) = tx). 
Due to noise there is no exact solution, the objective function has to minimize 
the reprojection error e; between measured and modelled image points 


ei = 1Q; - Q¿(R, da) |]. 


Thus, the objective function including all image point correspondences is to 
minimize the sum of all squared reprojection errors and is formulated by 


N 
: 2 
argmin X e; 
R,d =] 


with d = [dy...dy]. The solution is found by a non-linear optimization 
method, e.g., L evenberg-M arquardt. 


8.6 Application-Specific Algorithmic Parameterization 


The manifold varieties of algorithmic parameterizations for feature-based 
camera self-calibration lead to a sprawling design space, which is barely ascer- 
tainable in its entirety. Two exemplary selected application-specific aspects 
out of this design space are presented in this section. In subsection 8.6.1, 
the impact of differing bit depth of input images on the extraction of SIFT- 
features is shown. The parameterization of the presented matching methods 
is discussed in subsection 8.6.2. 


8.6.1 Decreasing Bit Depth of Input Images 
for Extraction of SIFT-features 


The availability of various cameras and the ongoing development of image 
processor technology lead to stereo systems, which provide digital images 
with a higher dynamic range. A higher bit depth of 8, 12 or 16 bit per pixel 
(bpp) promises a higher degree of representable details. However, it is not 
proven thata feature extractor will extract features of higher quality, when the 
bit depth for the input images is increased. In case of SIFT-feature extraction 
for a stereo camera self-calibration, this section shows, that the extracted 
pixel correspondences for 8 bpp input images and 12 bpp input images lead 
to identical pixel correspondences. 
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To ensure full accuracy during computations and to avoid effects of 
application-specific optimizations, a floating point software version of the 
SIFT-feature extraction is fed with 8 bpp and 12 bpp input images. Depending 
on the pixel depth of the input images, a bit depth specific algorithmic 
parameter set is configured. 

After the SIFT-feature extraction, the nearest-neighbor distance ratio 
matching in combination with a geometry-based restriction of matching 
candidates (GB NNDR) is applied in order to find corresponding pixels. The 
experiment is accomplished with a dataset for which rectified input images 
and related disparity maps exist to validate the detected pixel combinations 
(see Figure 8.19). By checking the disparity of a match position in the 
left input image, it is possible to verify the corresponding match position 
in the right image. A radius offset for the detected matches of e = 0.5 
pixels for the position is tolerated during this investigation. The quantities 
for the extracted features and detected matches are shown in Table 8.3. The 
algorithmic parameters for the different SIFT-feature extractions are chosen 
to yield at least 1,000 features for both input images of the stereo camera 
system. 
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Figure 8.19 Verification of match positions with disparity maps. For rectified images, the 
horizontal difference of feature positions of a corresponding pixel pair equals the related value 
of the disparity map. With this technique, it is possible to validate resulting matching lists for 
datasets with ground truth disparity maps. 
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Table 8.3 Numbers of extracted SIFT-features and detected matches for 8 bpp input images 
and 12 bpp images. The number of the geometry-based (GB) nearest-neighbor distance 
ratio matches (NNDR) drops significantly but ensures a high explicitness of matches. The 
algorithmic parameters of the SIFT-feature extraction of the two test cases are adjusted in 
order to extract a similar number of features, which lead to an identical number of verified 
matches 


8 bpp Image 12 bpp Image 


*SIFT-features left image 1,056 1,069 
3SIFT-features right image 1,011 1,019 
#G B NNB matches 1,013* 1,026* 
#G B NNDR matches 608/60.0% 611/59.6% 
#disparity verified matches 542/89.1% 544/89.0% 
#matches not valid for evaluation 29/4.8% 28/4.9% 
#matches wrong correspondences 37/6.1% 39/6.4% 


*n features of the left image have matched with features of the right image; 
duplicate assignments in the right image possible. 


The significant difference between the number of geometry-based NNB 
matches and geometry-based NNDR matches is caused by the ratio factor, 
by which equivocal correspondences are rejected. A few correct pixel assign- 
ments may be rejected as well using this method, but the matching difference 
of those pixel pairs is not sufficient small. A valuation of the resulting absolute 
numbers is beyond the focus of this chapter, but by comparing the differences 
of the two versions of SIFT-feature extraction and matching it is clear, that 
there is nearly no difference between using an 8 bpp input image or a 
12 bpp input image. To guarantee identical pixel correspondences, a visual 
inspection of the matching results is mandatory. In Figure 8.20 the result of 
detected SIF T-features of the left input image (blue: identical matches, orange: 
exclusive 12 bpp features, red: exclusive 8 bpp features) is shown. Out of 
1,069 detected feature positions in the 12 bpp input image, 1,045 (97.8%) 
identical feature positions are detected again in the 8 bpp input image. In 
addition, there are 24 (2.2%) exclusive 12 bpp feature positions detected and 
14 (1.3%) exclusive 8 bpp feature positions detected. Similar numbers are 
revealed by comparison for the feature extraction of the different right input 
images. 

After the geometry-based NNDR matching of both feature sets, the 
comparison of the resulting pairs of the matched pixel correspondences allows 
aconclusion, if there is a difference between a feature extraction and matching 
of a12 bpp input image and a 8 bpp input image. As shown in Figure 8.21, the 
bulk of the pixel correspondences are identical (blue lines); out of 611 found 
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Figure 8.20 Comparison of the resulting SIFT-features of the left input image for 12 bpp 
images and 8 bpp images. In the 12 bpp input image, an overall number of 1,069 features 
have been detected, whereas in the 8 bpp input image 1,056 features have been determined. 
A subset of 1,045 features (97.8%) is identical in both images (blue). There are 14 (1.3%) 
exclusive 8 bpp feature positions (red) detected and 24 (2.2%) exclusive 12 bpp feature positions 
(orange). 


Figure8.21 Comparison of the resulting pixel correspondences for the 8 bpp and 12 bpp input 
images. In the 12 bpp input image, an overall number of 611 pixel pairs has been detected, 
whereas in the 8 bpp input image 608 correspondences have been determined. A subset of 587 
pairs (96.1%) isidentical in both images (blue lines). Furthermore, there are 23 (3.8%) exclusive 
8 bpp pairs (red lines) and 24 (3.9%) exclusive 12 bpp pixel correspondences (orange lines). 


correspondences, 587 pairs (96.1%) are equal. In addition, there are 23 (3.8%) 
exclusive 8 bpp correspondences (red lines) and 24 (3.9%) exclusive 12 bpp 
correspondences (orange lines). 

By tuning the algorithmic parameters in relation to the pixel depth of the 
used input images in this case study, it is possible to extract identical pixel 
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correspondences. If there is no reason for further image processing steps, 
which require a proven higher bit depth than an 8 bpp graymap image, it is 
advisable to process the standard 8 bpp image in order to save computation 
resources. 


8.6.2 Threshold-based Feature Matching 


In this context of wide baseline stereo matching, threshold-based feature 
matching is used. As highlighted in subsection 8.2.2 , a nearest-neighbor- 
based match is defined as a pair of two descriptors, which are nearest neighbors 
of a matching process with a descriptor distance below a threshold. F urther- 
more, a feature only has one matching correspondence. In order to ensure a 
high rate of correct matches with a low rate of false matches, simultaneously, 
the threshold has to be selected in accordance to the algorithmic setup 
and the application-specific image content. Therefore, in this section a method 
for threshold selection is presented. 

Underlying assumption for selecting a threshold for the presented NNB 
matching is the fact that there are correct matches with a low descriptor 
distance, false matches with a higher descriptor distance and nothing in 
between. Again, correct and false matches in this experiment are evaluated 
with existing disparity maps of the stereo camera system. The descriptor 
distances of an idealized NNB feature matching is shown in Figure 8.22 (right 
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Figure8.22 Histogram of random generated SIFT-descriptor distances of an idealized NNB 
feature matching. The right distribution with mean p2 displays the distances of wrong matches, 
whereas the left distribution with mean pı illustrates the correct matches. 
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plot). For this experiment, 2 x 10% random generated SIFT-descriptors have 
been generated, pairs have been matched and the distances have been evaluated 
in a histogram. The resulting distribution of descriptor distances equals the 
Gaussian distribution, defined by mean u2 and deviation c». Obviously, those 
descriptor distances are false matches. Correct matches follow the same 
distribution, but with differing mean m. and deviation o1, as depicted in 
Figure 8.22 (left plot). By definition, descriptor distances are sums of absolute 
values, negative distances are not possible. 

By comparing the distance histogram of the synthetic idealized NNB 
feature matching (see Figure 8.22) with a real-world NNB SIFT-feature 
matching (see Figure 8.23, left plot), two distinctive differences are noticeable: 
Firstly, the distance distribution for the correct feature distances and the false 
feature distance are overlapped and secondly, both distributions are skewed in 
direction of the others distribution mean value. This distortion is explainable 
by the fact, that there are always non-avoidable false positives and false 
negatives during the matching process. Further information concerning the 
distance distribution is available in [39]. 

The resulting distance distribution for the NNB SIFT-feature matching is 
shown in Figure 8.23 (right plot). Based on this plot, a suitable threshold for 
the matching process has to be extracted. It is desirable to select a threshold, 
which skips all of the false matches and approves all correct matches, and 
which corresponds to a threshold between the two ideal distributions. Due to 
skewing and overlapping of the distributions, there is always a set of false 
matches, which has to be tolerated by the chosen threshold. Therefore, the 
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Figure 8.23 Histogram of descriptor distances for a NNB SIFT-feature matching with the 
extracted threshold according to Otsu. Distances of correct/wrong matches are displayed in 
blue/orange. T he complete distribution is shown in purple. 
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goal is to minimize the false matches and maximize the correct matches, 
simultaneously. 

Using the Otsu method [40], two overlapping distributions are separable 
by applying the discriminant criterion and utilizing the zeroth- and first-order 
cumulative moments of the distance histogram. Originally, Otsu presented 
his method for binarization of grey scale images, but the algorithm may be 
generalized for different types of histogram decomposition. B y separating the 
two Gaussian distributions with Otsu's method, the descriptor distance which 
divides the distribution into a correct and a false region is determined and set 
asthe matching threshold. Four different case studies have been executed (see 
Figures 8.23 and 8.24). Even for distance distributions, which do not show 
such a clear composition of two Gaussian distributions as the SIFT-feature 
matching case demonstrates, the Otsu's applied method provides reasonable 
thresholds. 

For the entire application of wide baseline stereo matching, the threshold 
extraction has been performed offline, but it is also conceivable to implement 
an adaptive frame-to-frame online threshold extraction. 


8.6.3 Parameterization of Matching Methods 


The aim of this section is the evaluation of the presented matching procedures 
(see subsection 8.2.2) and the related parameter sets regarding their quality 
of assigned pixel correspondences in stereo camera systems images. The 
presented matching methods (TB, NNB, NNDR) result in varying corres- 
pondence lists, each of different size and with a variable percentage of correct 
pixel correspondences. The matching technique, which provides a high rate 
of correct correspondences for this application and a low rate of wrong 
assignments simultaneously, has to be identified. 

Itis possibleto speed up the matching processthrough helpful assumptions 
aboutthe position of corresponding feature points based on the given geometry 
of the stereo camera system. Using a spatial pre-selection of detected feature 
points, the number of candidates for the subsequent descriptor matching 
is significantly limited. In addition to reducing the problem size for the 
matching step, the quality of the feature point correspondences is increased. 
This is caused by excluding matching candidates, which are geometrically 
contradictory for the used camera setup. Despite the possibility of highly 
similar descriptors, wrong correspondences are prohibited even before the 
matching step using this technique. 
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(b) Case study: A-KAZE-feature matching 


BRIEF-feature matching 
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(c) Case study: BRIEF-feature matching 


Figure 8.24 Histograms of descriptor distances for different NNB feature matching case 
studies with the extracted threshold according to Otsu. Distances of correct/wrong matches 
are displayed in blue/orange. The complete distribution is shown in purple. Due to different 
descriptors and resulting matching distances, various axis scales for clear presentation are used. 


Geometry-based feature matching 

The effect of spatial restriction of possible matching candidates (see 
Figure 8.17) in order to reduce the problem size for the feature matching 
depends on the permissible window size for matching candidates. In Table 8.4, 
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Table 8.4 Results for a SIFT-feature matching for a global matching and a geometry-based 
feature matching. The window size for the geometry-based feature matching is +/—4 pixel in 
y-direction and +100/—4 pixel in x-direction 


Global M atching Geometry-B ased M atching 
3SIFT-features left image 1,057 1,057 
3SI FT-features right image 1,011 1,011 


#avg matching candidates 1,0110 1,057 matchings 70 1,057 matchings 


an overview of the average number of candidates per matching event is given. 
In the left/right 8-bit input image, 1,057/1,011 SIFT-features are extracted, 
which leads to 1,011x1,057 descriptor comparisons, when a brute force 
approach is used. With a window size of +/—4 pixel in y-direction (for 
rectified input images) and +100/—4 pixel in x-direction, the average number 
of descriptor comparisons is reduced to 7x 1,057, which is a reduction of 
problem size of two orders of magnitude. T he exact numbers of the candidate 
distribution for the geometry-based matching are shown in Figure 8.25. The 
reduction of problem size by afactor of x 144 using the geometry-based feature 
matching in relation to the global matching clearly outperforms the test, if a 
detected feature is a matching candidate. T herefore, using the geometry-based 
matching approach is advisable. 


Choosing a matching method 

Different methods of feature matching with or without a spatial restriction 
of the matching candidates directly affect the quality of resulting feature 
correspondence lists. Exemplary numbers for a variation of matching methods 
are shown in Table 8.5. A gain, in the left/right 8-bit input image, 1,057/1,011 
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Figure 8.25 Exemplary histogram for the distribution of matching candidates for the 
geometry-based feature matching (see Table 8.4). The average number of candidates is 7 
candidates per matching event. 
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Table 8.5 Results of disparity verified feature correspondences for different combinations of 
global and spatial restriction matching methods. In addition to a high rate of correct matches, 
a minimal number of pixel correspondences has to be given for a reliable subsequent image 
processing. The total numbers of detected matches for selected algorithmic combinations are 
given in brackets. The number of correct matches and wrong matches do not result in 100% 
because of missing values in the ground truth disparity maps. Those values are skipped for 
evaluation 


Global Matching Geometry-B ased M atching 


*Correct Wrong #C orrect AN rong 


Matches Matches Matches M atches 
#1 B disparity verified matches 562 (1,057) 400 702 (1,006) 240 
53.2% 37.8% 69.8% 23.9% 
#NNB disparity verified matches — 541 (735) 149 556 (597) 22 
73.6% 20.3% 93.1% 3.7% 
#NNDR disparity 493 (540) 31 542 (605) 36 
verified matches 91.3% 5.7% 89.6% 6.0% 


SIFT-features are extracted. The total number of detected matches for each 
algorithmic combination is given in brackets (see Table 8.5). 

For each method, the geometry-based feature matching grants an improve- 
ment of the correct matching rate or the rate remains in the same order of 
magnitude. The resulting correspondence lists generated with the threshold- 
based feature matching has the highest number of entries, but the quota of 
correct matches is insufficiently low. A combination of NNB-matching and 
the geometry-based restriction leads to the highest rate of correct matches 
(93.1%) and a low rate of wrong matches (3.7%), simultaneously. F urther- 
more, the absolute number of correct matches (556) guarantees a stable base 
for following image processing algorithms. Therefore, the use of the NNB- 
matching with a geometry-based restriction of matching candidates in order to 
extract pixel correspondence lists for a feature-based camera self-calibration is 
recommended. 


Accuracy of localization 

All prior investigations in this section are based on the assumption that 
‘disparity verified matching’ defines the consensus of the extracted feature- 
based disparity including a small offset e and the related actual disparity taken 
from the disparity ground truth map. This offset e is necessary in order to 
tolerate small deviations of feature positions, which are caused during the 
localization step. 
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Figure 8.26 Rates of disparity verified pixel correspondences for different offsets e and three 
matching methods. For all methods, the rate of correct matches runs into saturation. The NNB 
matching method performs best over all offsets e. (TB: Threshold-Based Matching; NNB: 
N earest-N eighbor-B ased M atching; NN DR: Nearest-N eighbor Distance Ratio M atching). 


To evaluate the impact of varying offsets e, in the left/right 8-bit input 
image, 1,057 /1,011 SIF T-features are extracted and matched with a geometry- 
based approach for the TB, NNB and NNDR matching method. The rates of 
disparity verified pixel correspondences for different offsets e are shown in 
Figure 8.26. Remarkably the qualitative trend is identical for all matching 
methods. Furthermore, all methods run into saturation for offsets higher than 
3 pixel. As expected, the threshold-based matching (TB) provides the lowest 
matching rate for all offsets e. The nearest-neighbor based (NNB) matching 
method results constantly in the highest rate for disparity verified matches 
with approximately over 90% (<537 out of 597 matches) for an offset larger 
than 1 pixel. It is worth mentioning that 70% of all matches (419 out of 597 
matches) for the NNB method are identical to the ground truth disparity map 
(offsets e = 0 pixel). 

To achieve an applicable trade-off between exact ‘disparity verified corre- 
spondences’ and permitting localization errors due to viewpoint changes, all 
prior investigations have been verified with an offset e = 3 pixel. 


8.7 Hardware Based SIFT-Feature Extraction 


Fast and reliable extraction of SIFT-features in the presented context of 
feature-based camera self-calibration requires a tuned implementation of the 
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algorithm for the hardware platform used. Therefore, in this section, the 
relevant hardware properties of SIFT-feature extraction are introduced and 
an overview of existing SIFT-feature implementations is given. 


8.7.1 Challenges of SIFT-Feature Extraction 


The extraction of SIFT-features is a challenging task due to the number of 
operations and memory accesses that have to be executed. As depicted in 
Figure 8.27, the algorithmic steps of SIFT-feature extraction differ in varying 
ratios of control complexity and regular arithmetic. As shown in [41], the 
building of scale-space, which consists of multiple separable and symmetric 
Gaussian filters, is an arithmetically intensive task with almost no control 
overhead. In contrast, parts of the feature points detection or the descriptor 
generation require control mechanisms, which result in heavy branching on 
conventional processors. Furthermore, the scale-space is mandatory for the 
feature description and has to be buffered until the generation of descriptors, 
which requires a large memory and arbitrarily non-aligned memory accesses 
aggravate the challenging memory bottleneck. In addition, the algorithmic 
quality of SIFT has to be ensured for subsequent processing steps, which 
requires an appropriate level of internal accuracy of the temporal results. 
Therefore, specialized architectures are necessary to ensure the processing 
performance demanded for SIFT-feature extraction. At the same time, those 
specialized systems have to be as flexible as possible to guarantee a fast 
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Figure 8.27 Break down of SIFT-feature extraction into four algorithmic steps and relating 
qualitatively quota of control complexity and complexity (i.e., regular arithmetic). 
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implementation of future algorithms which might perform better compared 
to state-of-the-art feature extractors [42]. 


8.7.2 Existing Systems for Hardware Based 
SIFT-Feature Extraction 


Inthe following Table 8.6, a set of existing systems/platforms for the hardware 
based SI FT-feature extraction is presented. The selection shown is not meant to 
be exhaustive, but elucidates the trade-off of different platforms regarding suf- 
ficient processing power, low power consumption and satisfactory flexibility 
for future algorithm implementations. 

Moren et al. [43] presented in 2015 a comprehensive survey of a SIFT- 
feature extraction for homogeneous and heterogeneous CPU/GPU systems. 
With different techniques for parallelization and a portable performance con- 
cept using OpenCL (Open Computing L anguage), the SIF T-feature extraction 
has been implemented on various single device and multi-device platforms. 


Table 8.6 Overview of existing systems for SIFT-feature extraction 


Implementation Frequency Performance 
Author Year Device Powre (MHz)** (fps) 
CPU & Morenetal. [43] 2015 NvidiaGTX 780TI 250 W* 875 137.6 @ 640x480 
GPU AMD R9-290 300 W* 947 98.7 @ 640x480 


vidia GTX 580 244 W* 772 77.2 @ 640x480 
vidia Tesla C2050 238W* 1150 74.0 @ 640x480 
ntel MIC 3120A 300W* 1100 16.8 @ 640x480 
ntel Core-i7 4930K 130W* 3400 32.6 @ 640x480 
ntel Xeon E5-2667 130W* 2900 28.3 @ 640x480 
AMD Opteron 6168 115W* 1900 8.0 @ 640x480 
ntel Xeon E5-2667 130W* 2900 4.0 @ 640x480 


Mobile  Rister etal. [44] 2013 Snapdragon S4 ^4W  1,700/400 9.9 @ 320x240 
GPU exus 7 [A 1,600/520 8.6 @ 320x240 
Galaxy Note II JA 1,600/400 7.6 @ 320x240 
Tegra 250 ^3W 1,000/333 7.9 @ 320x240 
FPGA & Bonato et al. [45] 2008 Altera Stratix I! [A 100 30.0 9 320x240 
ASIC Yao et al. [46] 2009 Xilinx Virtex 5 IA 100 32.3 @ 640x480 
Huang et al. [47] 2012 TSMC 184m CM OS IA 100 30.0 @ 640x480 
Yum etal. [48] 2015 Xilinx Virtex 6 IA 170 36.9 @ 1280x720 


ASIP M entzer et al. [41, 42] 2015 TSMC 45nm process <1W 400 1 @ 800x640 


*Thermal Design Power. 
**For category mobile GPU: CPU/GPU frequency. 
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The systems are separated into four different implementations, where each 
implementation is optimized according to device specific characteristics: 


e Host-device implementation for control 
e GPU device implementation 

e Multi-core CPU device 

e Multi-device implementation 


The systems are evaluated for multiple image sizes for equal algorithmic 
setups. Single device runtimes are listed in Table 8.6 for VGA ¡mage size. 
Noticeable is the fact, that all single GPU systems and multi-device systems, 
in which a GPU is enlisted, provide enough performance for a real-time 
SIFT-feature extraction for VGA images, but require more than 230 W power 
consumption. Furthermore, CPU single device systems are close to real-time 
by providing 17-32 fps, but again, the power consumption is far too high 
for use in automobiles with over 115 W power consumption. The AMD 
Opteron 6168 and Intel Xeon E5 do not reach a sufficient frame rate for 
a SIFT-feature extraction application. The author presents three different 
heterogeneous systems, which are assembled by the afore mentioned single 
device systems, which provide enough performance for real-time applications 
even for very large images. For all systems, the flexibility is ensured by using 
the high-level OpenCL. 

In 2013, Rister [44] proposed an investigation of SIF T-feature extraction 
on four different platforms using mobile GPUs. The author used a hetero- 
geneous dataflow scheme and applied a partitioning of workload between 
CPU and GPU. Different platform specific optimizations are used, e.g., data 
compressing by pixel reordering or branchless convolution through on-the-fly 
code generation. With frame rates reaching between 7.6 fps and 9.9 fps, the 
performance is too poor for an use in A DAS, but a power consumption of the 
complete systems of <5 W fulfills the requirements demanded. Furthermore, 
flexibility is guaranteed by OpenGL for A ndroid. 

Bonato et al. presented 2008 the first hardware based SIFT implementa- 
tion [45]. The heterogeneous system consists of a hardware accelerator for 
SIFT-feature detection and a NIOS II softcore processor for SIF T-descriptor 
generation. The system has been emulated on an A Itera Stratix || FPGA anda 
frame rate of 30 fps for QVGA images has been reached. 

One year later in 2009, Yao et al. claimed to reach a comparable frame 
rate of 32.3 fps, but for VGA images. T hey presented a hardware-based SIFT- 
feature detector, which has been emulated on a ML507 board, and a SIFT- 
feature generation in software. The drawback of the presented work is the 
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simplified SIFT scale-space, which leads to a limited algorithmic quality, 
compared to the original algorithm. 

The first fully hardware-based SIFT-feature extraction has been presented 
in 2012 by Huang et al. [47]. The author's system reaches a frame rate of 30 
fps for VGA images and uses a TSMC 180 um CM OS process. 

In 2015, Yum et al. proposed a FPGA -based full SIFT implementation, 
which is capable of processing 36.85 fps for HD images on a X ilinx Virtex 
6 device [48]. By reducing the amount of necessary internal memory and a 
local-patch reuse scheme, a high data throughput is reached, but the building 
of scale-space is adjusted, which affects the algorithmic quality. 

These hardware-based approaches provide adequate processing power for 
ahigh framerate and a sufficiently low power consumption of typically <10W, 
but the presented systems are not SW -flexible. 

M entzer et al. [41, 42] presented an A SIP-based SIFT-feature extraction, 
which preserves the full algorithmic quality. Sufficient flexibility for future 
algorithms of image feature extraction is ensured by the platform-specific 
attribute of full software programmability. The drawback of the presented 
case study is the low frame rate in FPGA emulation, which prohibits a real 
time application in automotive use. 

T hus, heterogeneous systems consisting of dedicated hardware for acceler- 
ating the scale-space construction and a processor-based descriptor generation 
is a promising trade-off between flexibility, performance and power consump- 
tion. State-of-the-art conventional CPUs and GPUs are too power greedy, 
nowadays mobile GPUs do not reach sufficient frame rates and pure hardware- 
based systems do not fulfill the requirements for flexibility. A trade-off 
concerning flexibility by supporting a processor with non-programmable 
hardware accelerators is a possible approach for a SIFT-feature extraction 
in the field of Advanced Driver Assistance Systems. 


8.8 Conclusion 


In this chapter, selected aspects of self-calibration for wide baseline stereo 
camera systems for automotive applications have been introduced. Starting 
at the extraction and matching of image features up to the extrinsic online 
self calibration of stereo camera systems, fundamental algorithms have been 
presented. A promising algorithmic combination consisting of the extraction 
of SIFT-features, nearest-neighbor-based matching with spatial selection of 
matching candidates and the estimation of camera parameters in order to 
rectify misaligned stereo images have been discussed in detail. 
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Three exemplary aspects of algorithmic parameterizations, which are 
the impact of a decreasing bit depth of input images, the selection of a 
matching method and the threshold selection for the matching process, have 
been examined in detail to show substitutionally the complexity of adjusting 
existing algorithms to new applications. 

In the last section, basic challenges of hardware-based SIFT-feature 
extraction are presented and hardware-specific solutions for the afore men- 
tioned algorithmic challenges are discussed. Finally, existing systems for the 
extraction of SIFT-features are reviewed. 

As discussed in this chapter, there is no state-of-the-art hardware imple- 
mentation for the proposed algorithmic combination, which fulfills the three 
requirements for ADAS, and delivers sufficient processing performance, low 
power consumption and full flexibility for future algorithms. Thus, remaining 
challenges will be solved to improve safety for vulnerable road users and to 
enhance comfort in future automobiles. 
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9.1 Introduction 


Automated functions for real world traffic scenarios have been increasing in 
last years in the automotive industry. M any research contributions have been 
done in this field. However, other problems have come to the drivers, related 
to the legal and liability framework, where it is still unclear up to which 
point the control of the vehicle should stay with the driver or be taken by 
automation. 

The aim of the Advanced Driver Assistance Systems (ADAS) is mainly 
related to help drivers in safety critical situations rather than to replace them. 
However, in recent years, many research advances have been done in this 
field, making automated driving closer to reality day by day. The numbers 
of automated driving functions for typical traffic scenarios have increased 
in the last few years in the automotive industry and university research. 
However, other problems have appeared for drivers of such automated cars: 
W hen should the driver or the automated systems take control of the vehicle 
(since both cannot control an automated vehicle together at the same time 
due to potential conflicts)? This question has not a simple answer; it depends 
on different conditions, such as: the environment, driver condition, vehicle 
capabilities, fault tolerance, among others. Arbitration and control activities 
have been implemented in DESERVE WP24, mainly motivated by this 
question. 
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In this chapter, we will analyze the acceptability to the ADAS functions 
available in the market, and its relation with the different control actions. 
A survey on arbitration and control solutions in ADAS is presented. It 
will allow to create the basis for future development of a generic ADAS 
control (the lateral and longitudinal behavior), based on the integration of 
the application request, the driver behavior and driving conditions in the 
framework of the DESERVE project. Based on vehicle modeling, driver 
behavior and intention, a first approach for arbitration and control strategies, 
which can anticipate the priorities on the control in emergency situations, is 
described. 

The main aim of this work is to allow the development of a new generation 
of A DA S solutions where the control could be effectively shared between the 
vehicle and the driver. Some simulations will allow the virtual testing for the 
future implementation in demonstrators. 

Fuzzy logic techniques are a suitable approach for the arbitration control in 
the driving process. The contributions described in this chapter will be imple- 
mented in two demonstrators: A utomatic/A utonomous Emergency Braking 
(A EB) pedestrian protection system and D river Distraction monitoring— CRF 
demo vehicles— using RTM aps! as the development software. 

The proposed arbitration and shared control takes into account the state 
of the driver and the state of the system, in order to assess the level of 
control that each system should have; based on the standard SA E J 3016. Fuzzy 
Logic controllers consider a control level thatallows a smooth control sharing 
between the automated system and the driver. It has been design according 
to the A pplication Platform in DESERVE control architecture. A Ithough the 
Fuzzy Logic (as some other A rtificial Intelligence techniques) is not explicitly 
considered in the road vehicles functions safety standard (ISO 26262), alarge 
number of applications have been developed in recent years. T he behavior of 
a human driver can be emulated with this technique. 


9.2 ADAS Functions Available in the Market 


Driver Assistance Systems (DAS) or Advanced Driver Assistance Systems 
(ADA S) can be defined as those active safety systems which require some 
monitoring on the vehicle's environment and on driver intentions. This extra 
information is combined with ego-vehicle data (positions and speed profile) 
in order to provide the driver with some warning or perform some automatic 


!https;//intempora.com/ 


9.2 ADAS Functions Available in the Market 203 


actuation with the goal of increasing safety. Regarding driver interactions, a 
DAS can offer: 


e Information about the current situation 

e A warning to alert the driver 

e Take the control of the vehicle, partially or completely 
e A combination of them 


This section is focused on those DAS which have the capability of taking 
vehicle control to improve or correct the driver response. 
From the control point of view, control DAS systems can be classified as: 


e Longitudinal Control Systems: Those DAS which are able to modify 
vehicle speed by accelerating or braking. 

e Lateral Control Systems: Those DAS which are able to change vehicle 
direction, usually actuating on the steering system. 

e Global Control Systems: DAS with a combination of longitudinal and 
lateral control. 


The Control DAS examples described in this subchapter are shown below: 
Longitudinal Control Systems 
e ACC (A daptive Cruise Control) 
e FCW (Frontal Collision Warning or Forward Collision Warning) 
e AEB/CMbB (Automatic Emergency Braking/Collision Mitigation by 
Braking) 
e SLA (Speed Limit A ssistant) 
Lateral Control Systems 
e LDW/LKA (Lane Departure Warning/Lane K eeping A ssistance) 
e BSD/LCA (Blind Spot Detection/L ane Change A ssistant) 
Other Control Systems 


e Pedestrian Detection/A ctive H ood 
e Driver Distraction Detection 

e PreCrash 

e Parking A ssistance 


9.2.1 Longitudinal Control Systems 


These are the main steps for the longitudinal control of the vehicle: the first 
system is more a comfort than a safety one (A CC), but safety systems such as 
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Forward Collision Warning (FCW )orAEB arebuiltupon it. Other possibilities 
for Longitudinal Control of the vehicle are systems such as SLA. 


ACC (Adaptive Cruise Control) 

The ACC adds to the most common Cruise Control constant safety distance 
maintenance with the preceding vehicle. It consists of a front-mounted sensor, 
an integrated control unit with the task to regulate the system’s perfor- 
mance and a suitable HM that informs and allows the driver to control the 
system. 

This sensor controls the area in front of the vehicle. If no obstacle is 
detected, the vehicle keeps the selected speed as a standard cruise control. In 
case a vehicle is detected in the predicted path of the vehicle (target vehicle), 
the sensor calculates the relative distance and speed to the target vehicle. (up 
to around 150-200 m). Then, the Control U nit decides whether it is necessary 
to actuate the brake system of the vehicle with the goal to keep a constant 
safety distance. When the target vehicle disappears from the detection area, 
the Control Unit sends the order to accelerate again until the desired cruise 
speed is reached. 

The system works usually between 30 and 180 km/h. The maximum 
deceleration provided by the system is far from the maximum deceleration 
capabilities of the vehicle (in between 2 and 3 m/s?)?. The driver can choose 
between different safety gaps (time - related). Developed for high capacity 


Figure9.1 ACC Systems. 


?|n case the driver does not react, some other A CC systems are also improved with an AEB 
system, also considered as CM bB, providing autonomous brake action (from 5 m/s? to full 
power). 
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roads, ACC Stop € Go improves the performance of the conventional ACC 
to a full stop capability. The stop and go of the vehicle is, thus, automatically 
performed, so the range of the system is extended to 0-200 km/h. 


FCW (Frontal Collision Warning) 

W hen ACC fails to provide enough deceleration [exceed comfort specifi- 
cations (above 2-3 m/s?)], request to avoid a possible head-on collision, a 
warning, is provided to the driver (FCW). This warning reminds the driver the 
urge to take control of the situation. FCW is included in the basic ACC system 
in all vehicles equipped with the necessary sensors (laser, radar, etc.). These 
systems are usually activated between 5 and 2 seconds before the collision 
with the vehicle ahead might occur. 


AEB/CMbB (Automatic Emergency Braking/C ollision M itigation by 
Braking) 

Asthe third step in the longitudinal control of the vehicle, A EB is an automatic 
emergency safety system that takes control of the situation if the driver fails 
to decelerate the vehicle when a head-on collision is about to happen. The 
system consists on an automatic actuation of the vehicle’s brakes in case the 
situation requires so to avoid a crash. AEB systems can be divided according 
to their deceleration in 1) Soft Braking. Up to 5 m/s? and 2) Hard Braking. 
From 5 m/s? to the full capability of the braking system. 

Some systems can provide a progressive braking: first, a soft braking can 
be provided and, in case the accident seems unavoidable, a hard braking is 
applied. Also, a pre-fill of the brake circuit in case of possible risk (when the 
FCW system is launched) can be provided, in order to be ready for a full-brake 
in case itis required (either by the driver or automatically). |n case the system 
is not able to avoid an accident but can help in the collision mitigation as the 


ACC FCW AEB CMbB 
Normal Driving Risk of Accident Automatic Braking Automatic Braking 
<3 m/s2 Intervention of the >3 m/s? Collision 
driver required Unavoidable 


Figure9.2 Stages on the longitudinal control of the vehicle. 
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obstacle is crashed at a lower speed, itis called CM bB, Collision M itigation by 
Braking. The only difference is that AEB can really avoid the accident, while 
CMbB is launched a short time before the accident that can't be avoided 
any more. 


SLA (Speed Limit Assistant) 
The Speed Limit A ssistant (SLA) is a safety system that provides the driver 
with information on the most suitable maximum speed continuously during 
his or her journey. 

SLA system can be based on several sub-systems: 


e TSR (Traffic Sign Recognition): Recognition of the traffic signs on the 
road, either by vision or gathering information from a map, is shown to 
the driver as a reminder of the prevailing speed limits. 

e CSW (Curve Speed Warning): As extracted from the digital maps, 
information of the most suitable recommended maximum speed limits 


Figure9.3 CSW system. 


o 


Figure9.4 TSR system. 
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when passing the curve ahead are shown to the driver. A nother option is 
to show just a warning icon in case speed is considered as too high for 
the incoming bend. 


9.2.2 Lateral Control Systems 


Lateral control systems take care of the lateral dynamics of the vehicle, either 
warning the driver or taking control of the vehicle actuation systems. 


LDW/LKA (Lane Departure Warning/L ane K eeping Assistant) 

The Lane Departure Warning system has the task to warn the driver in case he 
drives out of the lane due to a distraction (without using the blinkers). M any 
OEMs offer today a Lane Departure System under different commercial 
brands (AFIL, Audi Lane Assist, etc.). It is composed by a sensor (or several 
sensors) with the capability to detect when the driver is leaving from the 
chosen lane, a Control Unit and a suitable HM for the driver. 

L ane's lines detection can be done through two different technologies: 


e Infrared sensors placed in the low part of the vehicle (PSA models): 
They use the reflection produced by the emitted light when driving over 
a white line to detect if the vehicle is driving over them. In this case, 
a Control Unit determines the driver is departing from the lane, and, 
depending on some other factors (blinkers, etc.), it can warn him or her 
by different methods (making the steering wheel or the seat vibrate, sound 
warning, etc.). 
Image processing: A camera— usually placed behind the windshield, on 
the rear view mirror housing— provides images which can be analyzed. 
Thus, it is possible to determine when the driver is departing from 
its chosen lane. This system brings advantages, such as ¡ts predictive 
capability (it can on obstacles in the already known driving corridor) 
and is more robust in front of situations such as arrows, providing 
considerably fewer false alarms. As a disadvantage, it can be less robust 
in case of poor visibility. 
In any case, the system works from a certain speed (commonly, from in 
between 60 and 80 km/h upwards) and can be switched off. M oreover, when 
activating the suitable blinker, the system understands that the driver really 
wants to change lane and no warning is provided in case of crossing the 
lines. 

An update of the system is also found in the market: LKA (Lane K eeping 
Assistant), which includes an additional torque on thesteering wheel (electrical 
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Figure 9.6 BSD/LCA system. 


power steering is required) that helps the driver to keep the vehicle into the 
desired lane. 


BSD (Blind Spot Detection) 
A Blind Spot Detection system has the goal to warn the driver in case another 
vehicle is located in the blind spot which is not controlled by the rear-view 
mirrors, 

Therefore, it counts on some sensors (commonly, short range radars @ 
24 GHz or image processing units) which monitor constantly the area placed 
in the lateral blind spots of the vehicle. These sensors provide information to 
a Control Unit, which decides the susceptibility to provide the driver with a 
warning. This warning can be acoustic, visual or haptic. 

Some systems can warn continuously on the existence of objects in the 
blind spot. Some others only warn when the driver expresses his or her will 
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to change lane, using the correspondent blinker. They usually work over a 
certain speed and are capable to exclude parked vehicles or those driving in 
the opposite direction, in order to reduce the false alarm rate. The detection 
area can measure around 10 meters behind the rear view mirror and 4 meters 
wide, enough to cover the blind spot. 


LCA (Lane Change Assistant) 

A Lane Change Assistant is a system which increases the possibilities of a 
Blind Spot Detection System. The detection distance can achieve up to 50-60 
meters behind the ego-vehicle (positions and speed profile of the vehicle) in 
the adjacent lanes. M oreover, the relative speed of the detected vehicles is 
also taken into account, so the system is capable to warn the driver in case the 
lane change is too risky because of a fast approaching vehicle from behind. 
Depending on some parameters, different warning levels can be included. 


9.2.3 Other Control Systems 


Pedestrian detection/A ctive hood 

A pedestrian detection system is capable to recognize a potential danger. In this 
case, the driver can be warned or even an automatic action can be performed 
(automatic speed adaptation). In case of unavoidable crash, the activation of 
passive safety measures is also considered (active hood). 


PreC rash systems 

In the transition or overlap between active and passive safety, PreCrash 
systems work when accidents are unavoidable. Its mission is, based on the 
information gathered by the rest of the safety systems, and after determining 
the accident cannot be avoided by its intervention, to prepare the passive safety 
elements of the vehicle to better perform their safety mission. For instance, 
when there’s a sure head-on collision, CMbB will reduce the speed of the 
crash, while PreCrash will pre-tension the seatbelts, will move the seats to 
place them in a more convenient position or will pre-trigger airbag deployment 
order. PreCrash systems can cover the front of the vehicle, the rear or all 360? 
of the vehicle. 


Parking assistance 


Parking assistanceis one of the mostimplemented DA S. There are many types 
of technology used on this. This section will not be focused on the traditional 
ultrasonic or vision aided parking assistance systems, but on the systems that 
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can provide some kind of support to the driver. These systems can be divided 
in the following ones: 


e Vision-Aided Systems: together with the image of a camera placed in the 
rear part of the vehicle, some support provided by visual guidelines in 
the dashboard display. 

e Top View Systems: up to 4 cameras placed on exposed surfaces around 
the vehicle provide images that, after some processing, can be shown on 
the vehicle's display as if it was seen from above. 

e Aided Park Systems: some systems can provide support to the driver on 
his/her search for parking spots or his/her maneuvers to park the vehicle. 


Figure 9.8 Aided park system. 
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Figure 9.9 Automatic park systems. 


e Automatic Park System: this system can take control of the steering of 
the vehicle in order to park automatically after detection of a parking 
slot. The driver remains responsible for the longitudinal control of the 
vehicle. 


9.2.4 Control Solution in ADAS 


Based on most control architectures for Automated and semi-automated 
vehicles [2], DESERVE is divided in three main platform parts or stages: per- 
ception, application and information-warning-intervention (IW I). The sensing 
and perception of environmental and onboard information is vitally important 
forany automotiveDA S function. B ased on preliminary work from other fund- 
ing projects in this area? the information flow and architectural decomposition 
of the DESERVE platform is shown in Figure 9.10. 

Thethree main building blocks in Figure 9.10 arethe perception layer, the 
application layer and the IWI controller layer. The same decomposition was 
also chosen from other parties in similar projects (like Interactl Ve [3]) and 
corresponds to the naturalistic behavior that is applied when accomplishing 
a given task, namely the action points "sense", "plan" and "act". As baseline 
DESERVE considers the results of several research projects, like |nteractlVe, 
but targets the standardization of the software architecture. 

Indeed, by handling the sensor and actuator information on a virtual and 
abstract level, a systematical standardization of input and output interfaces can 
be realized. This results both in a very good encapsulated module architecture 
and makes exchange or addition of further module components much easier. 


3InteractlVe— FP7/ICT funding project— www.interactl Ve-ip.eu 
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Figure 9.10 DESERVE platform. 


In particular, the Perception Platform processes the data received from 
the sensors that are available on the ego vehicle and sends them to the 
A pplication Platform. The data received from the A pplication Platform are 
used to develop control functions and to decide the actuation strategies. Finally, 
the output is sent to the IWI Platform informing the driver in case of warning 
conditions and activating the systems related to the longitudinal and/or lateral 
dynamics. 


9.2.4.1 Perception platform 

The main objective of the Perception layer is to define and develop the 
DESERVE platform components that will interface with sensors and actuators, 
acquiring information from the typical sources. A II these possible information 
sources are addressed, described and characterized in an abstract level that 
allows virtualization of input and output data. By using such an abstract and 
virtual intermediate layer the connection/exchange of sensors or actuators and 
the porting or adaptation to different vehicle models is expected to become 
much easier and less time consuming. 

The DESERVE Perception layer is composed of different sub-layers that 
build up, in their totality, the complete information source that can be imported 
into the DESERVE platform framework. In a generalized sense the Perception 
layer can be seen as the input and output (1/0) gateway, especially when 
including communication devices and the different actuators as part of the 1/0 
components. 
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9.2.4.2 Application platform 

Based on these assumptions and previous works, a control strategy for 
sharing vehicle control between the driver and embedded A DA S systems was 
proposed. These layers can be used dynamically, based on the information 
from the driver monitoring automotive— DMA. 

Since the driver is legally responsible for operating the car in ¡ts environ- 
ment, in our approach he/she will have the last responsibility in the arbitration 
control process. However, if the driver is not enabled to drive, then the control 
will be taken by the embedded system. 

The specific A pplication modules used in the arbitration and control of the 
vehicle are: 


e Threat assessment: the information from Frontal Object Perception, 
Vehicle trajectory and Driver intention modules will be considered, in 
order to establish a risk level in each scenario. 

e IWI manager: this module will determine the action to be taken by 
the driver or the vehicle (here we can set the Arbitration and Control 
functions). The Driver Assistance Systems involve two main decision 
makers: when is the driver who takes the control or when does the 
automated system and up to which extent. 

e Vehicle control: Only the brake pedal will be considered. Classical con- 
trol techniques considering comfortable/safe accelerations. Longitudinal 
control based on PID and Fuzzy logic controllers have been used in 
automated functions. 


The level of assistance provided by the automated car to the driver might 
change depending on the driver's state and on the situation at hand (imminence 
of danger). With a varying level of automation of the automated vehicle, 
control might smoothly flow from the driver to the automated car and vice 
versa. 


9.2.4.3 Information Warning Intervention (IWI) platform 
The Information Warning and Intervention module uses the output of the 
A pplication layer and provides ways to execute the interaction with the driver 
and the control of the vehicle. M ainly the information is sent to the actuators 
that will translate high level commands into acceleration and steering angle 
to provide the correct answer expected from the vehicle. 

In a similar way, information is sent through the HM | towards the driver 
if necessary. These messages will warn and inform the driver (visual and 
acoustic signals/messages), as well as interact with him/her (haptic signals). 
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In order for these messages to be effective, great efforts have been done in 
HMI solutions where the current hot topic is to share the control with the 
driver. In the following, a review of some techniques for the arbitration and 
shared control are presented. 


9.3 Survey on Arbitration and Control Solutions in ADAS 


In the transportation field, human machine interaction plays a key role. 
Nowadays, significant results have been achieved in the automated driving 
field (at least, under certain circumstances) [4, 5]. N onetheless, there is a long 
way to go before removing the driver from the loop in real traffic conditions. 

Parasuraman et al. [7], stated that the main problem in this kind of 
systems lies in the decision making process and the assignment of control 
responsibility. In the ITS field, shared control is the action of carrying a task 
simultaneously between a (on board) computer and a driver, differing from 
manual control and fully automation (since no real "sharing" is being done in 
this situations, see Figure 9.12). 

The first levels of automation were set by Sheridan in [9]. Here, 10 
differentlevels described the amount of responsibility for each decision maker. 
Flemisch etal. in [10] presents a more developed view of the levels needed for 
control sharing, where the automation is based in the H - metaphor and clarified 
in two main groups: Tight rein and loose rein. 

Recently [11], new taxonomy of automated driving was issued by SAE 
International; its control levels are depicted by Figure 9.12. Other levels of 
automation have already been proposed by the German Federal Highway 
Research Institute (BASt) [12] and the National Highway Traffic Safety 
Administration (NHTSA) [13]. A comparison of these is summarized in [11], 
stating that the SA E taxonomy is alike the other two, but gives a broader and 
more specified view of automation levels. For this reason, the SAE taxonomy 
will be the one taken into account (see Figure 9.12). 

W hen considering the driver in the control loop, it is important to know 
the automation level embedded in the vehicle. This will permit the control 


Figure 9.12 SAE J3016 standards of driving automation levels for on-road vehicles. 
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sharing system to set the limits for each decision maker. We will deepen in the 
arbitration concept as a way to change, in a smooth way, the level of control 
according to the situation in-hand. 

The Arbitration concept is the process of settling an argument or a 
disagreement by an entity thatis not involved.* Little research has been done 
in terms of arbitration (since it is a new concept in vehicle automation). 
First approaches define cognitive states and relations between humans and 
machines [6], also mental models as in human relationships have been 
considered by [14]. This consideration leads to a scenario where the status 
of the driver and the system must be known, at all times, aiming to set an 
accurate level of automation for the current situation. 

From the above, communication between the system and the driver 
should constantly occur, in a way that is possible for both to make a 
mental model of one another [14]. Also different metaphors have been 
stated, such as the copilot metaphor (referring to the automated system) 
and the H-metaphor as a comparison between horse-human cooperation and 
vehicle-human cooperation [15]. 


9.4 Human-Vehicle Interaction 


Increasing need to pay more attention to the human driver in interaction 
with the vehicle has been recently identified [1]. From other domains where 
automation is already widely used (e.g. aviation, central rooms) it is known 
that automation has both positive and negative effects on the human operator. 
With increasing automation in the vehicle domain these effects need to get far 
more attention on the short term, evaluating the human-vehicle relationship 
and assigning countermeasures if necessary [1]. In order to have a regular 
communication between the two decision makers (the driver and the embedded 
system), in [15], a haptic HM | system is proposed where active force feedback 
is the common language. This allows the message to be directly linked to the 
actuator where the reaction of the driver is expected, also allowing the system 
to evaluate the performance of the driver. The haptic feedback can also give 
hints in terms of the action the driver should perform (e.g. the steering wheel 
turns a little to the right or left in order to hint the driver). 

Haptic systems have been implemented widely across the literature: in 
gas pedal feedback [16, 17], and in steering wheel feedback [18, 19]. These 
are also used in training simulators, improving the performance of drivers in 
different scenarios. 


“Oxford dictionary. 
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Theuseof corrective feedbacks is known to cause over-corrective behavior 
[8] or bad performance when removed. This happens because it impairs the 
input-output relationship in motor skill learning of the driver. In [20], the 
haptic aid shows a good performance if the feedback is provided as needed 
and not all the time. 

For arbitration and shared control, a state of the driver is needed in order 
to know his current status to perform the driving task. In [21], an extensive 
study on driver distraction was performed. It showed that in terms of visual 
and cognitive attention sharing, while performing following or passing driving 
maneuvers, a warning from the HMI proved to be helpful. 

In [22], the importance of vision at the driving task was stated. 
Although visual acuity proved to be important, other indicators of the driver 
ability (Visual field, processing speed, divided attention, among others) have 
evidence-basis for their relevance to the driver ability and safety, and can 
be measured in a noninvasive way with recent in-car perception systems, 
as in [23]. 

Recently, the HAV Eit? project [24, 25], and the Interactl Ve? project [26] 
have made the first approaches into control sharing strategies, theoretically 
and in simulations, with driver-in-the-loop capabilities. 

The aim of arbitration and control solutions in ADAS, inside the 
DESERVE project is to effectively share the control with the driver and 
manage risky situations. In [27], ADAS applications are listed such as lane 
change assistance systems, pedestrian safety systems, adaptive light control, 
and parking assistance systems, among others. These are considered to 
improve the automated system and take into account the driver-in-the-loop 
for arbitration applications [28]. 

Arbitration systems for shared control applications is a new concept in 
the ITS research field. Based on previous contributions, it is the objective 
to develop a system able to share the control — in a smooth way— between 
the decision makers. Motivation for this approach can be found in social 
needs [29], legal challenges [1, 33] and technical bases such as the DESERVE 
platform (see [11]). 


9.5 Driver Monitoring 


Driver's limitations are very often related to his physiological and psycho- 
logical states. An optimum pilot state includes an optimum alertness level 


Snttp://haveit-eu.org/ 
Shttp://www.interactive-ip.eu/ 
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and a task-oriented attentiveness. The distinction between “alertness” and 
“attention” is justified in the way that driver “alertness” is presumed to be 
necessary but not sufficient for an appropriate focus on external events. T hus, 
drivers may be alert but still be inattentive. In order to assess alertness and 
attentiveness in the DESERVE project, two main factors are evaluated: 


e Drowsiness/fatigue 
e Distraction 


Up to now, auniversally valid definition of drowsiness still lacks. A tired driver 
mainly derives from performing a highly demanding task for extensive time 
periods (“time-on task” for the driving effort). Other definitions focus on the 
sleepiness level, which is the state of being ready to fall asleep. It is mainly 
caused by circadian rhythms and sleep disorders (reduced quality or quantity 
of sleep). 

On the other hand, “Driver distraction refers to those instances when a 
driver's attention is diverted from the primary task of driving the vehicle in a 
way that compromises safe driving performance”, [30]. This distraction can 
be either internal (e.g. other passengers interaction, cellphone, etc.) or external 
(e.g. other road users, traffic signs, etc.). It can also be classified in different 
modes as: Visual (external attractors for example advertisement on the side 
of the road or internal attractors e.g. looking to his children at the back of 
the vehicle, displaying an address onto a navigation device, etc.), acoustic 
(ringing phone, listening music) or cognitive distraction (conversing at phone 
but also internal thought and rumination, etc.). 

For more information about on-line driver monitoring approaches, the 
reader is referred to [34]. Here a description of the different on-the-market and 
research methods and approaches are described in detail. In the DESERVE 
project, two main approaches weretaken into consideration for the assessment 
of alertness and attentiveness of the driver: 

The Continental driver supervision system is implemented for a real 
time monitoring of two independent parameters, the drowsiness level (sleepi- 
ness vs. awakeness) and the visual inattention (e.g. the driver “is/is not” 
looking to the road) [23]. 

The Driver state monitoring includes a compact low consumption and 
high dynamic range (120 dB) CM OS camera sensor. The camera ¡s equipped 
with a global shutter for the synchronization with a set of pulsed NIR 
lights (850 nm). 

Ficosa's Somnoalert Sensor aims to detect “non-aptto drive” states using 
physiological signals such as thoracic effort signal. A n external thoracic effort 
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sensor sends the signals to a smartphone, where it is processed to evaluate the 
state of the driver and indicate if this becomes dangerous. 


9.5.1 Legal and Liability Aspects 


For automated vehicles, it is still unclear how legal and liability aspects are 
going to evolve. As a matter of fact, the U.S. legislation does not prohibit 
nor allows the use of automation in the driving task [31]. This leaves an 
important legal gap towards the responsibility of any action taken by the on- 
board system, since itis now an entity that “thinks for itself". Similar situations 
arise in Europe where in a crash the responsible at all times is the driver, even 
when an embedded system was controlling the vehicle [32]. 

From the legal perspective, several initiatives in the U.S., specifically in 
the states of Nevada (2011), Florida (2012), California (2012), Washington 
D.C. (2012) and Michigan (2014), have already established some of the 
minimum safety requirements in order to allow automated vehicles technology 
[33]. Other state legislations in the U.S. are following these initiatives, 
to take a wider view of this the reader is referred to [32] and [33]. In 
the E.U., initiatives launched between governments and manufactures are 
currently creating the framework for the new standards and regulations for 
automated driving. These address legal matters and promote the standard- 
ization of the automated vehicles technology, as for example the Citymobil2 
project [36]. 

As to liability, Beiker and Calo [35] noted that the situation is more 
complex with automated vehicles, concluding that itis unclear how the courts, 
or the public, will respond to the prospect of artificial intelligence acting on 
behalf of humans with fatal consequences. T hey expectthata setof policies can 
be established to create the necessary legal framework for further development 
of vehicle automation. In the E.U ., the legal framework sets the liability of any 
crash towards the driver. This creates many barriers for automated vehicles 
and restricts them to private roads. 

As a matter of fact, automation (or the lack of it) is not black or white 
but rather in shades of gray, complex and involving many design dimensions 
[1]. OEM s are careful with this and do not claim that an ADAS is working in 
all driving situations. A helpful model of automation is to consider different 
levels of assistance and automation that can e.g. be organized on a scale as in 
[11]. This not only suggests but encourages the use of systems that consider 
the driver-in-the-loop. These systems will allow the industry to add driver's 
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vigilance to their system's supervision and avoid gaps (at least in the legal 
framework). 


9.6 Sharing and Arbitration Strategies: DESERVE 
Approach 


The arbitration module is defined in the information, warning, intervention 
(IWI) manager (Application platform) of the DESERVE abstraction layer 
(Figure 9.11). This Advanced Driver Assistance System involves two main 
decision makers: the driver and the automated system. It will determine the 
level of responsibility of each of them at all times and allow smooth transitions 
between automation levels defined in [11]. 

Based on the information from different perception systems, it is possible 
to define fuzzy control parameters to achieve this, as was proposed in [37, 38]. 
This cognitive process will result in the selection of a course of action among 
several alternative scenarios (e.g., up to which amount the driver should be 
responsible of the pedal action in an A CC maneuver whiletired). The proposed 
system consists of atwo level fuzzy approach for the arbitration (IW | manager) 
and vehicle sharing (VM C) modules. 

The arbitration and sharing control concept has been developed 
in RTMaps, one of the development platform defined in DESERVE. 
Figure 9.13 shows the general diagram for the arbitration. Here a fuzzy logic 
approach is implemented to compute the automation assessment (or situation 
status of decision-makers). This value is an assessment of the alertness and 
attentiveness of the driver w.r.t. the risk detected from the situation status. 
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Figure 9.13 Arbitration and control sharing application: General diagram. 
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The sharing controller considers the automation assessment (but also the 
driver and the automated systems decisions) to decidethe level of control and 
responsibility of each decision maker in real time. The output goes then to the 
HMI, informing the driver (a haptic steering wheel system informs the driver 
of next maneuvers that the system is ready to perform), and to the vehicle 
control. This process is done in real time, allowing a smooth sharing between 
decision-makers. For details and further perspectivein first preliminary results 
please refer to [38]. 


9.7 Conclusions 


This chapter presents a survey on arbitration and control solutions forADAS, 
based on the A DAS solutions available in the market, and the ones considered 
from the functional requirements described in Sub-Project-1 of the DESERVE 
project. The main architecture is described as a three-pillar platform system 
first “sensing” the environment, then “planning” according to decisions made 
over perception data and finally “acting” to follow those decisions. 

Forthesharing and arbitration approach, different points of view have been 
considered. Here, the estimation of the driver state and the assessment of the 
risk related to the situation in hand arethe mostimportant ones. These allow the 
system to have a coherent evaluation of the situation of both decision makers 
and arbitrate if the vehicle's embedded system needs to intervene because of 
risky driver actions. 

This intervention is performed through haptic signals. However, there are 
still some challenges with respect to HM | solutions that can properly work asa 
communication bridge for the two decision makers and inform the driver— on 
time— of automated vehicles decisions. 

Furthermore, legal and liability aspects are important milestones yet 
to be tackled. Although some states of the U.S. are taking the initiative, 
law regarding automated vehicles is in its first steps. Liability and legal 
responsibility still lies with the driver, hence, in our approach the control 
lies with the drivers (the driver can deactivate the system at any stage and is 
stronger than haptic cues). In future research we will focus in the arbitration, 
to determine (using some perception information) up to which point the 
embedded system can take control of the vehicle and which situations are 
more dangerous (risk management, taking special care of situations where 
overreliance on the system occurs— the embedded system returns the control 
to the human driver). 
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10.1 Introduction 


Early '70s literature in traffic safety put into evidence how the majority of 
accidents is a consequence of human error. One of the pioneering work carried 
out in 1977 in the automotive domain [34] started from an examination of 
a large number of accidents and showed that more than 90% of them was 
determined by different kind of mistakes attributable solely to a human factor 
and rarely to technical and/or environmental failures. 

This finding was confirmed in the following years also in other domains 
with very complex technologic contexts (i.e. avionic, railway, etc.). 

Itwas realized that in the framework of the evolution of technical systems, 
the human element plays a fundamental role both as a governing factor 
and as a potential menace to safety. This concept paved the way for the 
modern preventive safety systems, wide known as A DAS (Advanced Driver 
Assistance System). 

The experience carried out into the DESERVE project (Development 
Platform for Safe and Efficient Drive) was agreed by all involved partners 
to be beneficial for the extension of future ADAS. A key role in this process 
is played by the Human M achine Interface (HM I). SinceA DA S systems cope 
with the driving task influencing driver's decisions or directly intervening 
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in the driving maneuver, the issue of the driver’s trust opens a crucial design 
problem, because the driver cedes a part of the control [30]. L ow trust, resulting 
e.g. from an earlier experience of failure, can lead to disuse of the system 
[24]. Building and enforcing the driver’s trust through a positive system 
experiencing depends not only on the proper functioning of the system itself 
(i.e. the capability of detecting some events) but also on the HM I design. 

In order to create those positive experiences and avoid the disuse of A DAS, 
one has to understand the driver and his/her goals and motives while driving 
[13], together with the role of technology in supporting the driver in his/her 
task and in avoiding road accidents. 

This chapter aims at exploring step by step the rationale behind the 
effective design of the Human M achine Interface for ADAS systems, giving 
the reader an outline of the role and scope of A DAS system. In next paragraphs, 
a particular focus on the role of humans and role of technology in the 
preventing of the road accidents is presented, along with the discussion of 
the importance of the detection of the driver's intention. T hen an example of a 
whole HM | design process is presented. In fact during the DESERV E project 
thein-vehicleHM I for 17 functions (13 of them wereA DA S) was designed and 
evaluated. This chapter will report to the reader how the HM | was conceived, 
including discussions on the role of ADAS in preventing imminent accidents 
and a short state of art on HM design approaches. 


10.2 Prevent Imminent Accidents: The Role of Humans, 
the Role of Technology 


In general, the amount of accidents among the years is progressively decreased 
since the second half of the '80s [8]. This basically depends both from a 
strengthen in humans awareness on the accident causes, partly influenced by 
the evolution of studies in humans factors, but mostly this depends on tech- 
nological innovation on vehicles. T hehistory of such evolution which intends 
to show the relationship existing between the run-up of the accident and the 
technologies and functionsfor safety enhancement will be presented in the next 
paragraph. 


10.2.1 From Passive to Preventive Safety 


The first phase in reaching a higher safety degree on vehicles was due to the 
introduction of the so called passive safety systems, whose main purpose is 
to improve the driver conditions while an accident takes place. Indeed, the 
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introduction of safety belts, airbag, etc., as well as the strengthening of the 
materials have significantly reduced the number of injuries and consequentl y 
the number of victims on the road. For instance, studies on the effectiveness 
of the seat belts were conducted since the end of the '60s starting from 
Sweden [4]. 

The second phase was characterized by the introduction of active safety 
systems, which were intended to increase the safety of the driver when 
approaching a dangerous situation. In particular, this period dealt with the 
introduction of systems such as the ABS (Anti-lock Braking Systems), the 
ESC (Electronic Stability Control), as well as other functions able to intervene 
by minimising the impact in proximity of a potential dangerous situation and, 
hence, by avoiding the accident. For instance cars equipped with ESC were 
22% less likely to be involved in crashes than those without, with 32% and 
38% fewer crashes in wet and snowy conditions [19]. 

The challenge of reducing even more the number of accidents consists in 
allowing the development of the so called preventive safety technology, which 
is conceived to assist the driver when the risk of occurring a hazardous and 
critic situation is greeting higher. T hese technologies, namedA DAS (A dvance 
Driver Assistance Systems) are able to monitor the driving dynamics by 
introducing preventive features in support of the driving activity. In particular, 
driving safety will be fostered on the longitudinal axis of the vehicle thanks, for 
instance, to the frontal collision warning and adaptive cruise control systems. 
Driving safety on the lateral axis can be improved by systems like lane support 
and lane warning. The implementation of blind spot improves the safety on 
the rear spectrum indeed. The purpose of this approach is twofold: on one 
hand it is intended to guarantee an high level of protection on the road, 
almost as if the driver was stuck inside of a kind of “safety bubble”, as 
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highlighted by some researchers when referring to the concept of "virtual 
safety belt" [31]. On the other hand, it aims at allowing cars to operate in 
coordination by implementing a scenario where the whole vehicles have 
high situation awareness capabilities. It is indubitable the effectiveness of 
the ADAS in driving safety, even most of them have not yet achieved a 
mature introduction in vehicle market but are still in the prototyping phase. 
Nevertheless, researches has shown that to an increase of the automation 
and accident prevention features included in the on-board technologies does 
not always correspond to increase of the driving confidence, especially if the 
drivers' expectations in vehicletechnologies interaction are not fully taken into 
consideration by designers. On the other hand, a theory known as Peltzman 
effect [25] seems to show that an improvement of confidence due to effective 
automated safety support systems, even if they are only able to increase the 
driving monitoring scenario, could induce drivers in improving, for instance, 
speed, till to jeopardize the effectiveness of such systems. 


10.2.2 The Role of Driver Model in ADAS Design 


As aforementioned, A dvanced Driver A ssistance Systems (A DAS) have been 
implemented more and more in recent years in the automotive industry, in 
order to move from passive safety to preventive safety. In this context, through 
the driver models, a more complete understanding of driver's behaviour is 
expected to have the opportunity to enhance the road safety and to increase 
the driver acceptance of in-vehicle advanced systems, by designing ADAS 
that are more suitable to the drivers. As a practical example: the Lane 
Departure Warning (L DW) warns the driver when the left/right lane is crossed 
without using the indicator. However, blinkers are used only half the time 
before a lane change [18] and, therefore, the LDW might warn the driver 
in situations in which s/he is in full control of the vehicle (for example, 
during an overtaking without blinker activated), causing a nuisance to the 
driver. If this situation occurs frequently, the driver might get so annoyed by 
the system that might deactivate the LDW, eliminating the possible safety 
benefit brought by the system. If the human behaviour could be modelled 
more precisely, itwould bepossibleto discriminate between an intentional lane 
crossing and (simply) an unintended lane crossing (with the L DW warning the 
driver only in the second case). T hen, driver acceptance of the LDW could be 
increased. Similar examples could be found for other ADAS such as Forward 
Collision Warning (FCW) and Blind Spot System (BDS). Then, the driver 
intention detection module might be used jointly with other systems to warn 
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the driver about risky behaviours or might be used for the communication 
with other ADAS. For instance, the lane change detection module could be 
implemented with a surrounding vision system or with a blind spot information 
system to prevent the driver from a dangerous overtaking manoeuvre (if an 
oncoming vehicle is spotted and, at the same time, a lane change intention is 
detected). 

The Driver Intention Detection M odule developed within the DESERVE 
project aims at modelling and predicting the driver's behavior at the tactical 
and operational levels of the Michon's model [20]. Among the maneu- 
vers taken into consideration for the prediction of driver's intent, the most 
researched are the lane change, the turning left/right, the braking and the lane 
keeping. For the scope of the DESERVE project, the focus will be placed on 
the prediction of lane changes (and possibly of overtaking) with the final aim 
of improving the acceptance of ADAS. If areliable lane change intention was 
developed, the warning could be issued only when needed: ADAS designed 
in such a way could increase driver’s acceptance and reach a higher benefit 
with respect to road safety. 

In the field of lane change intention detection, several researches have 
been already performed. One of the main authors on this topic is Salvucci. 
He applied the model tracing technique associated to a computational driver 
model to detect driver's intention to change lane [29]. M odel tracing tech- 
niques wereoriginally used for intelligenttutoring to predict students' possible 
next steps in problem solving. In the study of [29], data from the vehicle 
(steering wheel angle, accelerator depression, lateral position, longitudinal 
distance and time headway to a lead vehicle, longitudinal distance front and 
back, to vehicles in adjacent lanes) and from the environment (presence or 
absence of a lane to the left and right of the current travel lane) were used 
to build the model. Based on the information, the model calculates a desired 
steering angle and the accelerator position. The model performed well when 
tested both at the driving simulator and in the real vehicle, reaching a reliable 
detection of the maneuver after 1 second. 

Inalater research work [6], the authors developed and implemented a real- 
time lane change intent detection system which could go beyond thetraditional 
offline implementation. The authors made use of information collected from 
the vehicle (steering wheel angle, yaw rate and blinker state signal), the 
A daptive Cruise Control (distance to the lead vehicle, the relative speed, time 
gap to the vehicle in front and the difference between the current speed and 
the desired speed), the Lane Departure Warning (vehicle lateral deviation, 
lane curvature and vehicle yaw angle), the Side Warning Assist (occupancy 
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and speed state within a critical zone) and the head position (head motion, 
head yaw and head pitch), adopting a time window of 2 second to trace the 
past events. A classifier based on relevance vector machines (RV M ) was used 
for the lane change intent. The results show that, for a good prediction of 
the lane change intention, the inputs from the direct observation of the driver 
(head-viewing camera) are relevant and that the quality of the classification 
is improved (unreliable detections are beyond 3 seconds). In a later article 
[15], a multiclass Support Vector M achine (SVM) algorithm associated to 
a Bayesian filtering approach to predict lane change intention was used. The 
variables used as inputs for the algorithm were the lateral position of the vehicle 
(obtained from a lane tracker system), the steering angle, the first derivative 
of the lane position and the first derivative of the steering angle. The research 
was formulated as a multiclass classification problem with three possible 
outcomes: left lane change, rightlane change and no lane change. On top of the 
multiclass classifier, a Bayesian Filter (BF) in order to improvethe reliability 
of the predictions was used. The comparison between the SVM algorithm 
alone and the combination of SVM and BF shows that, in the first case, many 
false alarms were observed but the precision was increased by adding the 
Bayesian Filter, reducing average prediction times. M ost of the lane changes 
are predicted almost 1.3 seconds before the lane crossing with a maximum 
prediction horizon reaching 3.3 seconds. The authors reported that further 
improvements might be brought by inclusion of other variables as the distance 
to the vehicle in front and the speed difference with the vehicle in front. 

Overall, despite the knowledge acquired concerning the prediction of 
driver'sintention to start a lane change, the topic is still interesting because the 
problem of lane change intention has shown to be extremely challenging. In 
particular, for having amore reliable prediction of driver intent, three aspects 
should be considered: 


e to increase the precision of the prediction algorithms; 

e to augment the detection time prior to the lane change; 

e to decrease the number of variables to predict the lane change (not all the 
sensors used in the previous studies are available in common vehicles). 


In addition, as pointed out by previous research [7], there are aspects which 
should be considered when designing a study to infer driver’s intention 
prediction: 


e typeof inputs to be used: CAN data (steering wheel angle, pedal position, 
turn indicator), lane position sensor/camera (lateral lane position and 
standard deviation) and sensors for behavior data (head motion, eye 
motion foot and hands positions). 
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e typeof algorithm to be adopted fortheanalysis: SupportVectorM achines, 
Bayesian Nets, Hidden M arkov M odels 

e material to be employed for the experiment: real vehicle (naturalistic or 
imposed) or driving simulator. 


Regarding the first aspect, the results highly improve when measures of driver 
behavior are included, especially the head motion. However, this information 
is, usually, not available in common vehicles and, therefore, this feature should 
be further analyzed. 


10.3 HMI Design Flow: The DESERVE Approach 


In order to develop an HMI concept for ADA S capable of generating positive 
experiences during the driving task, a design workflow of 5 steps was used: 


1. Collecting the state of art and last trends in the automotive HMI 
designing; 

2. Defining three different HM | concepts; 

3. Preliminary testing the three HM | concepts by a focus group; 

4. Testing the best 2 concepts by a user test at driving simulator; 

5. Defining the final concept. 


The HMI was designed in order to allow adaptation strategies that takes into 
account the inputs provided by the driver model. 


10.3.1 Different Approaches in the HMI of the Preventing 
Warning Systems: A State of Art in a Glance 


From the point of view of the on-board human machine interface correlated 
to the different type of preventive accident systems, the evolution of HM I for 
ADAS could be clustered in three main phases. 

It is possible to name the first era of preventive accident systems HMI 
as warning era. Most of the active and preventive systems above mentioned, 
which are not expected to be automatically actuated, are at the end a kind of 
warning based systems as they are aimed at increasing the driver awareness 
thanks to the support of technologies. The corresponding HM I is therefore 
based on alerts and aimed at delivering to the drivers immediately potential 
risks so to restore a safe situation for the driver. 

The second phase coincides with an important transformation induced 
by the active and preventing safety systems evolution moving from being 
only activated by on-board sensors to a larger spectrum of sensors including 
both vehicle, other vehicles (Vehicles to Vehicles - V2V) and infrastructure 
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(Vehicles to Infrastructure - V 21). This technological evolution is creating 
so-called cooperative ADAS perspective [16] where preventive capabilities 
of such systems is allowed by the connection of the infrastructure. In terms 
of HMI, itis evident that vehicles are not necessary and exclusively oriented 
towards a dimension characterized by warning-based interfaces. A Ithough this 
mechanism tends to persist, as well as to be necessary, it is also evident that 
within a system characterized by a high level of cooperation, the warning- 
based system might be easily replaced by arecommending-based mechanism. 
In other words, if vehicles are able to mutually recognize each other, as 
well as to cooperate for exchanging information and data, the system for 
supporting the driver will be aimed at sharing behavioural choice among 
the cars, rather than imposing and reporting imminent dangers. D3COS EU 
project (www.d3cos.eu) - among its results - have firstly proposed such 
promising concept in HMI for preventive accident systems [29]. This new 
dimension represents a real shift of paradigm going towards an increasing 
level of automation. 

Thethird phase is characterized by the integration between the cooperative 
and the warning-based dimensions from one side, and the increased level of 
automation in cars (according to SAE Standard ]0316) from the other. In 
this situation, expected HM Is will raise even more complex issues. Firstly, 
if on one hand it is true that automation will set the driver free from the 
necessity of constantly driving the vehicle, on the other hand, the driver 
is obliged to continuously monitor the correct functioning of the whole 
system. In a pioneering work, [1] expressed the idea of a sort of irony 
hiding behind the concept of automation. In fact, if theoretically speaking, 
the purpose of automation is to exclude the user from the driving tasks, in 
practices autonomous systems tends to encourage even more the participation 
of the driver, who must continuously monitor the correct functioning of the 
mechanism. The more the vehicle is autonomous, the more the driver is 
responsible for the only monitoring and the design issues for HM | designers is 
how to provide the best monitoring and to re-allocate the control to the drivers 
in the most effective and quicker way. 


10.4 HMI Concepts Design 


The three HMI concepts developed within the DESERVE project included the 
information normally displayed in the dashboard (i.e. speedometer, odometer, 
fuel level and water temperature information, diagnostic telltales, etc.), ADAS 
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information support (i.e. lane change assistance system, nigh view, parking 
aid, adaptive cruise control, etc. as well as drowsy driver alert system) and 
navigation information. 

Moreover a particular attention was dedicated to the design layout of 
the drowsy driver alert system. Drowsiness detection can be used to give 
a direct warning to the driver (explicit drowsiness) or as an input for an HM | 
reconfiguration strategy (implicit drowsiness). These two different strategies 
for drowsiness management were applied to all the three HMI concepts, 
obtaining hence 6 concepts to test. For the explicit drowsiness a warning is 
delivered to the driver with an icon and a message. For the implicit drowsiness 
A DAS sensitivity is set to the highest level. Once the driver takes a break, the 
ADAS configuration s/he set before is restored. 

The user interface deploys 17 functions: 13 of them areA DA S, 2 are Safety 
Assistance Systems, and 2 are VIS (In-Vehicle Information System), as listed 
in the following: 


1. Lane change assistance system (A DA S); 
2. Night vision system with pedestrian detection (A DA S); 
3. Rear view camera system (Safety A ssistance); 
4. Surround view (Safety A ssistance); 
5. Lane departure warning (A DA S); 
6. Pedestrian safety system (ADAS); 
7. Collision warning system (A DA S); 
8. Emergency braking ahead (A DA S); 
9. Rear approaching vehicle (A DA S); 
10. A daptive high beam assist (ADA S); 
11. A daptive cruise control (ADA S); 
12. Curve warning system (ADA S); 
13. Intelligent park assist (A DAS); 
14. Traffic sign recognition (ADAS); 
15. Driver impairment warning system (ADA S); 
16. Navi/M ap info (IVIS); 
17. Setting menu (IVIS). 


10.4.1 Concept 1: Holistic HMI 


In the Holistic HMI concept all the HMI elements (1/0) are centralized in 
front of the driver. The Instrument Panel Cluster (IPC) is the main visual 
output channel, while the steering wheel (SW) is the main input channel. 


236 TheHMI of Preventing Warning Systems: The DESERVE Approach 


The HMI elements are listed as follows: i) IPC display 12"; ii) SW 
commands; iii) Left stalk commands; iv) Buttons; v) K nobs. 

The instrument panel cluster was divided in three areas. In the central area 
the following information are delivered: lane change assistance system, night 
vision system with pedestrian detection, rear view camera system, surround 
view and setting menu. 

The left area is mainly dedicated to the hazard warnings: lane departure 
warning, pedestrian safety system, collision warning system, emergency 
braking ahead, rear approaching vehicle, adaptive high beam assist, adaptive 
cruise control, and curve warning system are displayed. 

In the right area the following information are delivered: intelligent 
park assist, traffic sign recognition, driver impairment warning system and 
navigation. 


Figure 10.2 Holistic HM | concept, that shows: IPC display 12”; SW commands; left stalk 
commands; buttons; knobs. 
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Figure 10.3 Holistic HM | layout. 
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Figure106 Holistic HMI layout with the rear view camera in the central area. 
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Figure 10.8 (A-B-C-D) Holistic HMI left area with: lane departure warning, collision 
warning, Rear approaching vehicle system, pedestrian safety system. 


10.4.2 Concept 2: Immersive HMI 


The second concept is totally different from the previous one. While the 
Holistic HM | concept centralizes all the info and the interaction with the driver 
in front of him/her, the Immersive HM I concept distributes the interaction 
along the dashboard and the windscreen. 

The HMI elements of concept 2 are listed as follows: i) 3,5” IPC display; 
ii) Touch Display 8,5” in the dashboard; iii) Head-up display for the wind- 
screen; iv) SW commands; v) Left stalk commands; vi) Buttons; vii) Knobs. 

In the concept 2 the area dedicated to the hazard warnings was moved 
in the middle of the instrument panel cluster, while the navigation, the rear 
view camera, the night vision system, radio/multimedia, phone and menu 
applications were moved to the dashboard display. The head-up display 
delivers traffic sign recognition and lane change assist information on the 
windscreen. 
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| d 
Figure 109 |mmersive HMI concept shows: 3,5" IPC display; touch display 8,5" in the 
dashboard; head-up display for the windscreen; SW commands; left stalk commands; buttons; 
knobs. 


10.4.3 Concept 3: Smart HMI 


Thethird concept replaces the dashboard display with a nomadic device (ND - 
i.e. smartphone/tablet). The HM can reconfigure itself according to ND size. 

The IPC display has the same structure of that one of concept 2. The 
difference is that in the Smart HM | concept the 3,5” display of concept 2 was 
integrated by adding, for example, a 7" tablet (as in Figure 10.7) seamlessly 
connected with the car system. Drivers just connectthe phone with a cable and 
immediately s/he gains access to ND applications using dashboard/steering- 
wheel buttons. The ND can provide also the access to further automotive 
applications. Driver can define what kind of information has to be shown in 
the ND: the ND is able to manage the infotainment functions and someA DA S 
applications. 


Figure 10.10 |mmersive HM I concept: instrument panel cluster display. 
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Figure 10.12 Immersive HM I concept: head-up display details. 


The HMI elements of concept 3 are listed as follows: i) Display 3,5” in 
the IPC: ii) Touch Display of the nomadic device set into the dashboard; 
iii) SW commands; iv) Left stalk commands; v) Buttons; vi) K nobs. 


10.5 Preliminary Testing by Focus Group 


AsMorgan described, “in essence, focus groups are special occasions devoted 
to gathering data on specific topics [21]". Using a focus group leads to evaluate 
preliminary concepts and in this case it is a useful technique to evaluate 
the proposals explained before [28], [35] having in mind that focus group 
is a technique deeply used in automotive field to evaluate user experience 
regarding HMI concepts [2, 9-12, 17]. 
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Figure10.13 Smart HMI concept. 


Figure10.14 Smart HMI concept: Nomadic device with night vision system. 


10.5.1 Participants 


Sample is composed by 7 participants with a range of age between 25 and 39 
years old (M = 31.71; SD = 4.06). Around 30% drive between 10000- 15000 
km/year and around 45% more than 20000 kmj year. All drivers run at least 
once a day during the last year. 2 drivers run than 1096 of their total driven by 
city, other 2 drive between 20-25%, and 3 run at least 40% or more of their 
driving in city. M oreover, around 4096 drive usually on dual carriage way, and 
3096 run on highway and in similar percentage, 3096, drive on main roads. 


10.5.2 Results 


Participants discussed and exchanged points of views about HM I. They gave 
scores about degree of utility, easy to use, easy to learn, visual clarity, if the 
concepts wereintuitive, degree of accessibility, and degree of driver annoyance 
and finally they provide a global value. 
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The HMI concept 1 (with explicit drowsiness) was considered as very 
useful, enough easy to use, with the most visual clarity and degree of 
accessibility among all the options presented. Having information located 
in same area is positive to avoid distraction and the three delimited areas for 
presenting information are pleasant. In general, alternative to concept 1 (with 
implicit drowsiness) is less appreciated than the original one. Scores are lowest 
than previous concept and the absence of drowsiness icon is missing by focus 
group participants. 

Regarding HM | concept 2 (with explicit drowsiness), most of the partic- 
ipants appreciated to have information on HUD, moreover to have primary 
information in a different place from secondary one is a positive attribute. 
Besides, this concept seems to be a bit more easy to use and to learn and more 
intuitive. Concept 2 bis (with implicit drowsiness) is measured as intuitive 
and have visual clarity. Once more, HUD information is well appreciated 
by focus group participants. Anyway, it should be necessary to take into 
account that drivers are not being confident to manage drowsiness without a 
detailed icon. 

Concept 3 (with explicit drowsiness) is not really appreciated from an 
aesthetically point of view. It is enough useful, easy to use and learn and 
enough intuitive. Focus group participants liked the possibility to place tablet 
where they prefer although it could mean less frontal vision. Last concept 
(n. 3 with implicit drowsiness) showed participants the least acceptable one. 
Although it will be positive to place the table according the wishes of drivers 
the general impression of having information in this way is not positive, even 
if itis having in mind that there is not drowsiness icon. 


10.5.3 List of the Winning Features and Redesign 
Recommendations 


Asitcan be observed in the radar chart which summarizes the HM | evaluation 
for the six concepts, concept 3 and its alternative, concept 3 bis (with implicit 
drowsiness) were the concept less valued. This concept “3 bis” is the concept 
whichis considered more annoyed. Concept 2 had the highest average scorefor 
the global evaluation but concept 1 is closed to concept which adds information 
on aHUD and touch dashboard. Concept 1 stands out by its accessibility, utility 
and visual clarity and concept 2 is highlighted by its feature to be easy to use 
and it is a bit more intuitive. 

D uring the session participants pointed several issues that should betaking 
into account: 
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Figure10.15 Radar chart summarizing HM | evaluation for the 6 HM | concepts. Bis concepts 
are concept 1, 2, 3 with implicit drowsiness. 


e Summarizing the best option should have drowsiness icon. 

e Option concept 1 and concept 2 are the best. 

e The possibility to have HUD information is really appreciated. 

e Participants suggested having in HUD the following information: traffic 
signals, gap for A CC, navigator system (with arrows and distances). 

e For traffic signal information, it is very important to them to maintain 
thisinformation available because sometimes you forgotthis information 
(e.g. when you are running by aroad and you forgot which was the speed 
limit). 

e Information should be very clear and concise. 

e It should be a great idea to have the possibility to select where you want 
to have the navigation system. 


10.6 Users Test at Driving Simulator 


As a final step for the definition of the overall HM | concept the two winning 
option from the focus group, namely concept 1 and concept 2 with the explicit 
drowsiness icon, were tested with users on a driving simulator in order 
to identify the final DESERVE HMI concept configuration. Each user was 
interviewed alone by a usability expert gathering comments and suggestions 
about the different A DA S function disposition and visualization. 

A mong the 13 ADAS functions developed for the DESERVE project, it 
was decided to test only 4A DA S functions that were considered representative 
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of the main HMI concept logic. In particular the following ADAS functions 
were widely tested with users: 


1. Forward collision warning - with acoustic signal type 1. 

2. Rear view camera system. 

3. Lane change assistance system - with acoustic signal type 1. 
4. Drowsiness icon - with acoustic signal type 2. 


10.6.1 Participants 


Sample is composed by 30 participants (20 M ale and 10 female) with a range 
of age between 23 and 62 years old (M = 32.17; SD =7.15). The majority of 
participants achieved a M aster's degree. 

The 30% of participants drive more than 20.000 km/year and the remaining 
between 10000 and 15000 km/year (M km/year = 15600; SD =6931.18). 


10.6.2 Procedure 


After a brief explanation of testobjective and some questions on personal data, 
user where asked to seat on the driving simulator and imagine to be inside 
their car, at the driving place with a dashboard of your car in front where some 
information about the car, its functioning and so on are displayed. Before 
assessing the solutions users where asked to practice a little with the driving 
simulator and to count the stars that appear on the road. 

In particular user where asked to evaluate on a 7 point scale: 


e The suitability of the HM I concept tested; 

e The comprehensibility of the information displayed; 

e The number of the information displayed; 

e The pleasantness from a graphical point of view of the HMI concept 
tested; 


10.6.3 Results 


From the analysis of the different part of HMI concept test, concept 1 seems 
to be the preferred one even if the difference with the percentage of users 
that prefer concept 2 is not statistically significant. Despite this result the 60% 
of users would like to have the warning information in the central part of 
the display instead of in the lateral part. The functions representation seems 
quite clear for all users, only the adaptive light control and the adaptive cruise 
control icon should be re-designed. Considering the result of the task that 


10.6 Users Test at Driving Simulator 245 


WARNING 


Mii m 


20 m 
Iu m p SIT A 
Figure10.18 Final DESERVE HMI concept: rear view camera. 
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Figure 10.19 Final DESERVE HMI concept: navigation. 


asked users to build their own solution, almost all distributed all functions in 
the same central display. 

Thanks to users’ feedbacks, the final DESERVE HMI concept has a single 
display with the warning functions in the central area and the gauges in the 
lateral part of the display. 


10.7 Conclusions 


Most cars today contain heterogeneous ADAS that support safe and clean 
driving. Because the pattern of factors in the automotive domain is constantl y 
changing (new technologies and devices on board, new infrastructure, new 
mobility concepts, new trends in pollution prevention), the accident charac- 
teristics of the transport domain are also changing. A s a consequence, also the 
research in that domain changed perspective, starting to investigate the human 
factor in order to improve safety and to prevent accidents. Even if it is not 
feasible to exactly predict the next accident, it is possible to anticipate some 
decisive characteristics of future accidents, as driver's misbehaviour. All these 
features concur in defining a new concept of ADAS system as a support and 
sometimes as a partner for drivers during task accomplishment and no more 
as a mere substitute. 

Since nowadays more and more ADAS function are going to be imple- 
mented in current vehicles, the need for a unique Human M achine Interface is 
becoming an issue that reflects the increasing complexity of the entire system, 
whereby the driver has to deal with different devices and different interaction 
strategies. The aim of this work was in fact to identify the most suitable HM | 
concepts that allow an easy integration of different A DA S function in order 
to guarantee the safety of the introduction of any new element. 
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11.1 Introduction 


Testing vehicular functions can be a very tedious task. The classical approach 
tries to tackle this problem using a multiple-stage validation and testing pro- 
cess. T hefirststep isa M odel-In-the-L oop (M IL) approach which allows quick 
algorithmic development without involving dedicated hardware. U sually, this 
level of development involves high-level abstraction software frameworks 
running on general-purpose computers. The second step is a Software-In- 
the-L oop (SIL) validation, where the actual implementation of the developed 
model will be evaluated on general-purpose hardware. This step requires a 
complete software implementation very close to the final one. The last step 
of this validation process is Hardware-In-the-L oop (HIL) which involves the 
final hardware, running the final software with input and output connected 
to a simulator. This proven process is very widely used in the transportation 
industry and has enabled the development of very high quality components 
which are then integrated into bigger systems or vehicles. M odern vehicles 
however integrate so many such components that the integration phase has 
become more complex and also requires a multi-step validation process. The 
final integration tests are performed on tracks or roads. W hile mandatory, these 
real-condition tests are limited because of multiple factors and have a very 
high cost. 
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Testing a complex system like a modern vehicle on a test track or on a 
real road involves complex and costly engineering. First of all, to be testable 
the vehicle must be fully or nearly-fully functional. This limits the testing 
opportunity to a very late stage in the development process and implies 
high engineering costs. Moreover, because the real-condition test is very 
constrained in time and space, the test coverage is not complete and only 
a very small variety of real-world conditions can be tested. 

To address these limitations and lower the cost, modern ADA S (A dvanced 
Driver A ssistance Systems) development frameworks uses avirtual test bench 
approach where realistic simulator software and hardware are used to enable 
faster and less expensive tests with better coverage on complete vehicles. 
In this document, we propose a virtual testing system built on a chassis 
dynamometer which enables a complex test scenario to be applied early in 
the A DAS development process. 

Our proposed system, named SERBER (Simulateur d' Environnement 
Routier integré à un Banc de test véhicule pour |'Evaluation de stratégies de 
gestion de l'éneRgie embarquée) aims to ease ADAS prototypes testing and 
at the same time, analyze the energy efficiency of the prototype system using 
the standard equipment of the chassis dynamometer. A previous version of 
this system has been published in [3], which presented the SERBER system 
and showed preliminary results. 


11.2 State of the Art 


In the automotive industry, car manufacturers use different ways to test and 
validate ADAS and other embedded systems. A n extensive study of the state 
of the artinADAS testing and validation methods can be found in [1]. These 
test methods can be grouped in two categories: test-bench tests and in-vehicle 
tests, 

For test-bench tests, three approaches are usually used during the devel- 
opment cycle: M odel-In-the-Loop (MIL), Software-In-the-Loop (SIL) and 
Hardware-In-the-Loop (HIL).In MIL, amodel of the developed system is inte- 
grated in a simulation loop with models of vehicle dynamics, sensor, actuators 
and traffic environment. A fter successful MIL validation, the SIL approach 
allows to replace the tested model with a real software implementation for 
real-time operation validation. The last step, HIL, consists of a combination 
of simulated and real components in order to validate the functionality of the 
developed system on both hardware and software aspects. 
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Test-bench tests are very useful as they provide a safe, repeatable and 
reliable way to validate these embedded systems under a variety of operating 
conditions. This kind of tests also has some drawbacks. For example, the 
interaction with other ADAS is difficult to test as well as the integration 
in the vehicle system. A sample of a HIL test bench for complex ADAS is 
available in [2]. 

The second category of tests methods are in-vehicle tests. These tests 
require a prototype to integrate the developed system. A gain, three approaches 
are commonly used: test-drives on test-tracks, test-drives on open-roads and 
Vehicle-H ardware-In-the-L oop (VeHIL). 

The first two approaches are very similar and assume the prototype to be 
driven in real-conditions. The test-track allows control of some environment 
parameters (traffic, some weather conditions, road signs, road type and so on) 
but requires big infrastructures. The open-road tests require less dedicated 
infrastructures but are of limited use because of the difficulty to reproduce the 
needed conditions, and the underlying safety problems. Both of these methods 
are costly and time-consuming and can’t be used early in the development 
cycle because they require heavy engineering efforts to havea fully functional 
prototype to drive. 

A very interesting solution which combines nearly all the advantages of the 
previous methods without most of their draw backs is the VeH IL approach. This 
kind of tests isa combination of the H IL and test-drives approaches. Functional 
as well as integration tests can be done easily and early in the development 
cycle. As the vehicle is physically locked on the chassis-dynamometer, this 
System greatly improves the safety of the tests. Because it is an indoor 
test, every environmental parameter (humidity, ambient light, temperature 
and so on) can easily be controlled and thus the repeatability of the test is 
ensured. 

Existing VeHIL systems like the one described in [1] and currently 
used by [2] relies on mobile platforms (called Mobile Bases) to move 
targets (fake cars and pedestrians) in front of the tested vehicle in order to 
trigger the various embedded functions (pedestrian detection, ACC, AEB 
and so on). This setup however needs heavy infrastructure: the chassis- 
dynamometer is installed in a very large room (200 x 40 m) and the 
targets are moved at high speed by the M obile Bases which can be dan- 
gerous for both the tested vehicle and the persons involved. Thus, the tests 
are remotely executed from a control room and the test area has to be 
evacuated. 
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11.3 Proposed System 


To address the problems of existing VeHIL systems (large infrastructures, 
fast moving targets, hazard for people), we propose a system which asso- 
ciates a chassis dynamometer with multi-sensor road environment simulation 
software. The simulator uses a description of the virtual environment and 
the position of the vehicle to generate multi-sensors data. These data are 
then fed into the sensors of the real car placed on the chassis dynamometer. 
On the other way, motion data (speed, acceleration) are gathered from the 
chassis-dynamometer and used to update the simulated vehicle speed and 
position. 

Our system, as seen in Figure 11.1 is mainly composed of three parts: the 
chassis-dynamometer, multi-sensor simulation software running in a computer 
and devices to feed the vehicle sensors like LCD screen and the CAN bus 
interface with synthetic data. T he chassis-dynamometer is standard equipment, 
the main requirement is to be able to connect it with the simulation computer 
in order to read the vehicle actual speed and control the simulated slope 
by adjusting the friction force applied by the dynamometer. The simulation 
software is at the core of our system and is responsible for the generation 
of sensor data to be fed into ADAS sensors. We use the Pro-Sivic software 
dedicated to this kind of application. An introduction to Pro-Sivic can befound 
in [4]. The difficulties in our proposed system lies mainly in the way to fool 
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Figure 11.1 Overview of the SERBER VeHIL system. 
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(i.e. feed synthetic data into) the vehicle's sensors with the data produced by 
the simulation software. 

Three ways can be used to fool sensors. The first way (full simulation) is 
to disconnect completely the ADAS and replace it with an electronic probe 
controlled by our system, which simulates the ADAS behavior completely. 
The simulated data (A DA S outputs) are sent directly into the vehicle internal 
communication bus (CAN) to be used by the other vehicular functions. The 
second way (sensor simulation) is to disconnect only the sensor part of the 
ADAS and replace it with an electronic probe. The simulator generates data 
according to the specification of the simulated sensor. The “signal processing” 
part of the ADAS is kept in the loop, so it can be tested by the system. 
This approach however requires the sensor to be separated from the main 
ADAS unit. The last way (stimulation) is to keep the full ADAS in the loop 
and send physical stimuli to the ADAS sensor through dedicated hardware. 
For example, an LCD screen can be placed in front of an embedded camera 
or a Hyper-Frequency generator can send signals to an embedded RADAR 
sensor. 

This last solution is the preferred one, as it keeps the whole ADAS in the 
testing loop and limits the modifications done to the vehicle. So the objective 
of our work is to be able to simulate and fool every vehicle sensors. This 
approach however is very difficult to achieve for some kind of sensors, like 
inertial sensors and environmental sensors, or needs very complex stimulation 
hardwarefor RADAR and LIDAR. 

With such a hardware-in-the-loop system, multiple scenarios can be 
implemented and tested in the safety and convenience of an indoor workshop. 
Thissystem can beused for new A DAS prototyping as itis very easy to produce 
test-cases for the specific system under development. It can also be used to 
test the integration of multiple ADAS in a car, using a set of predefined test- 
cases to validate their interaction. It can also be used for very complex ADAS 
or fully-automated vehicle development where the embedded system relies 
simultaneously on multiple sensors to operate, because it is able to simulate 
nearly every aspect of the road environment at the same time. 

Moreover, the use of a chassis dynamometer allows a simultaneous 
analysis of various performance indicators of the vehicle, including energy 
consumption and pollution. This coupling is a real benefit compared to 
traditional test setups and enables the early evaluation of the energy con- 
sumption impact of various changes in the ADAS systems. For example, 
the fuel consumption and pollution of a car equipped with A daptive Cruise 
Control (A CC) can be continuously monitored as variousA CC algorithms are 
developed and tested. 
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11.4 Hardware Implementation 


The proposed system is implemented at IRSEEM facilities. A chassis- 
dynamometer is available in one of our technological platform and is used 
as a building block. This chassis dynamometer is a Horiba Vulcan 4WD with 
two independent axles. It provides real-time velocity information based on 
the real vehicle wheel speed. It can also apply a friction force equivalent to a 
5% slope of the road. The control system allows interfacing through analog 
inputs and outputs, these are used to control the friction force to simulate 
the slope and to read the actual speed of the vehicle in real-time. A complete 
description of the used chassis-dynamometer is available from the Horiba 
website [5]. We use an analog input/output device from National Instru- 
ments to link the simulation computer with the chassis-dynamometer control 
system. 

One of the main challenges in our proposal is the ability to generate 
synthetic data and feed the ADAS sensors with this data. The synthetic data 
generation for vision-based, RADAR-based and LIDAR-based sensors is 
handled by Pro-Sivic. The main problem is how to correctly stimulate the 
sensors to feed it with this simulation data. 


11.4.1 Sensors Stimulation Solutions 


Feeding sensors with simulated data is a key function of our system and a com- 
plex challenge. A vehicle can embed numerous sensors like cameras, inertial 
sensors, temperature sensors, rain sensors, odometer, LIDAR, RADAR, GPS 
and more. Because of the broad variety of sensor types, different approaches 
are needed to be able to control what the sensor reports to theA DAS processor. 

For camera-based sensors, we use direct stimulation using a standard 
computer display placed in front of the camera. A first successful test was 
done with a 32" LCD screen. A display system using a projector would allow 
a bigger image surface and is currently being tested. Image field-of-view and 
distortions have to be taken into account for an accurate stimulation of the 
ADAS sensor. Special care must be taken in order to completely cover the 
sensor’s field of view. This is especially difficult with wide-angle or fish-eye 
cameras and would require special setup. 

The rain sensor can easily be triggered using a localized water diffusion 
device (sprinkler) actuated by asolenoid valve. This system can also be used to 
generate rain-like perturbation on camera sensors by directly applying water 
on the windshield. However, such solutions can produce perturbations which 
are not reproducible. 


11.4 Hardware Implementation 257 


GPS simulation devices already exist for factory tests and are able to 
generate a controlled fake position to be interpreted by GPS receivers nearby. 
This kind of device could easily be integrated with our system to provide real- 
time positioning to the vehicle and embedded ADAS using GPS as a source 
of information. T hese systems are however costly and a direct transmission of 
generated NM EA frames to theA DAS is vastly more cost effective, but needs 
a small modification of the vehicle under test. 

Likewise, real-time target generators for various types of RADAR (24 and 
76 GHz) are available as off-the-shelve component. These systems cans also 
be coupled with the real-time simulation software to report the position and 
speed of simulated actors to the vehicle. 

A recent paper [6] shows a possible implementation of a target simulator 
for LIDAR sensors. In this paper, a pulse generator is synchronized with the 
LIDAR in order to inject false object echo. Fully functional real-time target 
simulators are however to be demonstrated. This setup could be used in our 
system like the RADAR target simulator described above. 

Inertial sensors are not covered yet. These sensors are usually deeply 
embedded in ECU and can be difficult to physically disconnect. An option 
is to physically move the vehicle using external actuators, but this implies 
heavy equipment. A nother option is to open the ECU and physically replace 
the sensor with an electronic probe, which is time-consuming and difficult to 
achieve without complete documentation. 

For Ultra-Sonic range-finder, two main solutions are possible. The first 
one isto use a sound generator simulating echo. The other one is to use small 
mobile targets located directly in front of the sensors. A s these sensors are 
usually used only for low-speed maneuver and short-range detection, these 
mobile systems would not require a big infrastructure and can safely be used 
even in the presence of people. 

Recently, Vehicle-to-Vehicle (V 2V) and Vehicle-to-Infrastructure (V 21) 
communication has widespread with the 802.11p standard. T hese systems are 
used as a kind of virtual sensor providing the position, relative speed and status 
of the vehicles in the vicinity. Because of their operation, such systems are 
easy to connect with a computer. |n our system we used one 802.11p modem to 
generate synthetic CAM and DENM messagesto be interpreted by the vehicle 
under test. 

Feeding the sensor with simulated data is not an easy task and each sensor 
has to be addressed differently. We plan to use sensor stimulation whenever 
possible, and fall back to sensor simulation and sending data in the CAN 
bus when stimulation is not feasible. Some kind of sensors appears to be 
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relatively easy to stimulate (like cameras), others needs very complex and 
costly equipment (GPS, RADAR and inertial sensors). 

For all these sensors, an alternative approach would be to have a cooper- 
ative software embedded in the ECU which would allow to overwrite actual 
measurements through the CAN bus (or another communication medium). 
While this solution seems unlikely to be possible on production vehicles; 
prototypes and test vehicles can be equipped with such debugging software, 
enabling a controlled and effective way to bypass sensors and feed synthetic 
data straight to the embedded processors. 


11.4.2 Software Implementation 


Our software runs on a high-end laptop computer and is based on two main 
building blocks: multi-sensor simulation software (Pro-Sivic) and a real-time 
middleware (RTM APS).A block diagram of the complete system is presented 
in Figure 11.2. 

To run the simulations, we used Pro-Sivic from Civitec. This real-time 
multi-sensor simulation software is a fusion between a driving simulator and 
a multi-sensor simulator. Pro-Sivic provides kinetic data and sensor data from 


Chassis RtMaps ProSivic Simulator 
Dynamometer 


Figure 11.2 Block diagram of the SERBER system. 
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the simulated vehicle and can also be used as a driving simulator. A complete 
description of Pro-Sivic is given in [4]. Pro-Sivic is able to generate realistic 
video output which can be directly used to stimulate camera-based A DA S. A 
sample view of Pro-Sivic video output can be seen in Figure 11.3. 

The other main software building block, RTMAPS from Intempora is 
a middleware which interconnects all other parts of the system. It is also 
used to produce CAN messages to be sent on the vehicle bus and perform 
other implementation-specific operations. RTMAPS is a component-based 
graphical programming framework to easily build multi-tasks or distributed 
applications. This software is described in detail in [7] and in this book 
(Part 1, Chapter 4). RTM A PS provides native interfaces to multiple simulation 
software, including Pro-Sivic, and also numerous components for device 
support (CA N peak, serial GPS, National Instruments I/O device and so on). 

The most significant part of the RTMAPS diagram is presented in 
Figure 11.4. This diagram main task is to handle the communication between 
Pro-Sivic and the chassis-dynamometer; and to generate Vehicle-to-Vehicle 
and Vehicle-to-I nfrastructure communication messages based on the simula- 
tion data. 


Figure 11.3 Sample video output of Pro-Sivic. 
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Figure 11.4 RTMAPS diagram of the system (extract). 


Our chassis-dynamometer has a hardware limitation: the vehicle front 
wheels cannot turn when the vehicle is moving; or damage can occur. In 
order to prevent this, the test vehicles driving wheel is physically removed 
and replaced by a USB joystick connected to the computer. This allows 
lateral control of the virtual vehicle by the driver, in a way very similar to 
driving simulators, while the physical wheels stays in line with the chassis 
dynamometer. 


11.5 Experimental Setup 


In order to test our system, we equipped a small fully-electrical vehicle with 
an after-market A DA S system: a M obileye 560. This ADAS, designed to be 
installed on the windshield, is based on a forward-looking camera and an 
integrated processor which performs real-time image processing. The main 
unit contains the camera and a processing device, and a separate display is 
used to inform the driver of the working state of the system and to show 
warnings. A Bluetooth connection allows using a dedicated application on a 
smartphone or tablet to display various data in addition to the one already 
shown on the small display. A picture of the system is shown in Figure 11.5 
where the main unit is shown on the right, the small display in the middle, and 
a smart-phone running the dedicated application on the left. 

TheM obileye system is able to detect and track many objects: pedestrians, 
other vehicles, speed-limit signs, and white lines. The position of the tracked 
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Figure 11.5 Mobileye 560 aftermarket vision-based A DAS. 


objects, as well as the vehicle speed information gathered from the CAN -bus 
is used to detect dangerous situations and to warn the driver: risk of pedestrian 
collision, risk of forward collision, lane departure, over-speed and so on. All 
the processing is done inside the M obileye main unit and only high-level 
information is available through a small display. The Mobileye system is 
described in details in [8] and up-to-date information is available in [9]. 

The V 2V communication test bench is composed of two K hoda Wireless 
MK 2802.11p modems equipped with M obileM ark SM W-303 multiband 
antennas. One of the modem is used to send CAM and DENM messages 
generated from virtual vehicles data. T he other modem is used as an embedded 
unitin the vehicle under test. T he data received by this second modem are used 
to update a dashboard HM I. An extract of the RTM A PS diagram responsible 
for the communication task is shown Figure 11.6. 
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Figure 11.6 RTMAPS diagram of the V 2V task. 
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Figure 11.7 TheBiocar test vehicle on the Horiba chassis dynamometer. 


The test vehicle equipped with the M obileye is placed on the chassis 
dynamometer, and an LCD screen is placed in front of the windshield, in 
the sight of both the driver and the M obileye system. The Figure 11.7 shows 
a view of the test vehicle installed on the chassis-dynamometer. The LCD 
screen can be seen in front of the car. 

The whole system was tested with an urban scenario and environment. 
This scenario is composed of a few roads with some buildings and trees; the 
traffic is simulated with four cars following a predefined path. T he virtual car 
can freely move inside this environment and is directed by actions from the 
driver. A view of the urban scenario is shown in Figure 11.8. 


11.6 Results 


A first series of results have been obtained with the described experimental 
setup. The virtual car forward motion is completely controlled by the real 
vehicle controls (accelerator and brake pedals), while the lateral control is 
obtained from the USB driving wheel connected to the computer. 

First, the integration of the chassis-dynamometer with the simulation 
software was tested. The real car speed is read and used to update the virtual 
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Figure 11.8 Overview of the urban environment in Pro-Sivic. 


vehicle motion. In Pro-Sivic, the road slope under the vehicle is processed and 
this information is used to control the resistive torque applied by the chassis- 
dynamometer on the real vehicle. During the tests, the car driver can feel the 
resistive torque applied by the system on the vehicle wheels when climbing 
a slope, and has a feeling of free wheels when going down. The Figure 11.9 
shows a picture taken near the driver's seat. Driving the car is natural and 
intuitive, just as if the car would be on a real road. The driving simulator use- 
case is not the main goal of this system but this first test proves the interest of 
the SERBER system even for ADA S which involves driver interaction. 
TheADAS sensor stimulation abilities of the system were tested using the 
M obileye. This test showed promising results as the M obileye was fooled by 
the simulation and worked as if the car would be running on a real road. The 
lane departure warning and forward collision warning have been triggered 
by the corresponding simulated situations. The Figure 11.10 shows the lane 
departure warning being triggered when the car is crossing the road central 
line with the blinkers off. In this picture, the road is clearly seen on the LCD 
screen in the top right part. In the bottom left part, atablet running the M obileye 
application shows a graphical representation of the warning being triggered. 
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Figure 11.10 Lane departure warning triggered. 


The last tested functionality isthe V 2V communication simulation. In this 
test, four virtual vehicles are simulated and their global positions are broad- 
casted by the 802.11p modem using CAM messages. Various DENM messages 
are also broadcasted by virtual vehicles. A nother modem is embedded in the 
vehicle under test and receives these messages. An HM is used to display this 
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Figure 11.11 V2V Communication HMI. 


data to the driver, using a RADAR-like circular representation. A snapshot of 
the HMI is shown in Figure 11.11. 


11.7 Conclusion and Future Work 


In this chapter, we have presented SERBER, our Vehicle-H ardware-|n-the- 
Loop system which uses a chassis-dynamometer and a multi-sensor simulation 
software to create a kind of virtual reality platform for intelligent vehicles 
equipped withA DAS using sensorsto gather information from the surrounding 
environment. The combination of the simulation software and the chassis- 
dynamometer allows applying the resulting force from a simulated slope to 
the real vehicle, while the sensor data generated by the simulation software 
are fed into theADAS. 

We discussed different way show the system can feed simulated data to 
sensors, both at the communication-bus level (CAN messages) and at the 
physical-stimuli level. 

We described our current implementation based on Pro-Sivic, RTMAPS 
and a Horiba chassis-dynamometer and presented the first results obtained 
by a complete test using a small electrical car equipped with an after-market 
camera-based ADAS and 802.11p modem. The result presented in this paper 
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shows the ability to fool an ADAS system based on a forward-looking 
camera. Various functions of the ADAS are triggered when correspond- 
ing situations are simulated: forward collision warning and lane departure 
warning. 

TheDESERVE project aims to provide an environmentforA DA S design, 
development and pre-validation. In this context, SERBER provides a virtual 
testing platform enabling early tests of newly designed ADAS with realistic 
scenarios and testing environments. This system can also be used to validate 
multipleA DAS interaction on the same vehicle and aims to be a complete test 
and validation system for fully-autonomous vehicles. 

SERBER is more compact and simpler to use than other VeHIL systems 
which use mobile bases to move fake cars at high speed in order to simulate 
other vehicles motion. In fact, our system can easily be installed on a standard 
chassis-dynamometer, if it can be controlled by software, requiring only minor 
physical modification of the facility. 

The work presented in this chapter is a first step towards a complete 
simulation system able to stimulate multiple sensors in the tested vehicle. 
Currently, only camera-based ADAS and V2X communication systems can 
be stimulated. 

The first area of improvement for the current system is the simulation 
and stimulation of additional sensors. A RADAR virtual target generator is 
currently being developed in order to fool RA DA R-basedA DAS likeA daptive 
Cruise Control (A CC) and A utomatic Emergency Braking (AEB). A LIDAR 
target generator and GPS simulator can be integrated to provide a quite 
complete setup able to test realistic scenarios. 

A second area of improvement is in the simulation environment and 
scenario. To be able to test corner-cases and complex interaction of various 
ADAS functions, sophisticated scenarios involving various road environ- 
ments, pedestrians, other vehicles and driver behavior have to be designed 
and implemented. 
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